CLNTLN.MSG[COM,LSP]21 - www.SailDart.org

perm filename CLNTLN.MSG[COM,LSP]21 blob sn#871952 filedate 1989-04-10 generic text, type C, neo UTF8
COMMENT ⊗   VALID 00001 PAGES
C REC  PAGE   DESCRIPTION
C00001 00001
C00002 ENDMK
C⊗;
∂17-Dec-87  1712	CL-Characters-mailer 	test    
Received: from IBM.COM by SAIL.STANFORD.EDU with TCP; 17 Dec 87  17:12:33 PST
Date: Thu, 17 Dec 87 11:35:57 PST
From: Thom Linden <baggins@ibm.com>
To: "X3J13: Character Subcommittee" <cl-characters@sail.stanford.edu>
Message-ID: <871217.113557.baggins@IBM.com>
Subject: test

  test of new router name

∂17-Dec-87  1809	CL-Characters-mailer 	mailbox name change, JEIDA interaction,  sub-topics  
Received: from IBM.COM by SAIL.STANFORD.EDU with TCP; 17 Dec 87  18:09:01 PST
Date: Thu, 17 Dec 87 17:46:49 PST
From: Thom Linden <baggins@ibm.com>
To: "X3J13: Character Subcommittee" <cl-characters@sail.stanford.edu>
Message-ID: <871217.174649.baggins@IBM.com>
Subject: mailbox name change, JEIDA interaction,  sub-topics
Subject: new mailbox router is now operational

As evidenced by the rejected message below,
cl-natural-languages is no more.  please use cl-characters.

Regards,
  Thom

------------------------------------------------------------


Date: 17 Dec 87 10:59:48
From: Mailer-Daemon at IBM.COM
To: BAGGINS

IBM.COM Mail Server unable to deliver the following mail to:cl-natural-languages
Reason:
Negative reply from Host:sail.stanford.edu
550 I don't know anybody named cl-natural-languages

           ** Text of Mail follows **
Date: Thu, 17 Dec 87 10:31:39 PST
From: Thom Linden <baggins@ibm.com>
To: "X3J13: Character Subcommittee"
    <cl-natural-languages@sail.stanford.edu>
Message-ID: <871217.103139.baggins@IBM.com>
Subject: mailbox name change, JEIDA interaction,  sub-topics

  Sometime soon, our router at stanford will change to cl-characters.
I'll broadcast as soon as I determine it is operational.

  My counterpart at the IBM Tokyo Research Lab, presented the IBM
character extensions proposal at a JEIDA meeting in Nov.  JEIDA knows
that this has not yet been discussed by our ANSI committee.

  Per our discussion at the Ft Collins meeting, I am inviting ISO&JEIDA
to join our conferencing (via the stanford router as soon as the
new name is in effect).

  Larry made the reasonable suggestion that we decide
on the sub-topics of the proposals and deal with each (initially)
somewhat independently.

  Hopefully, everyone has a copy of the proposal material by now!
Let me know if not and I will ship a copy asap.

  My stab at sub-topics is:

     Type hierarchy
        eg. thin-string

     Explicit character set manipulation
        eg. define-char-set

     Equivalence
        eg. define-equivalence-class

     I/O interface
        eg. print-width

     Character set (or subset) predicates
        eg. jcl:jis-char-p


  ?other suggestions?





Happy Holidays,
  Thom

∂21-Dec-87  1918	CL-Characters-mailer 	Network communications 
Received: from IBM.COM by SAIL.STANFORD.EDU with TCP; 21 Dec 87  10:59:00 PST
Date: Mon, 21 Dec 87 10:13:40 PST
From: Thom Linden <baggins@ibm.com>
To: "Dr. Takayasu Ito" <tito%aoba.aoba.tohoku.junet@relay.cs.net>,
    "Dr. Taiichi Yuasa" <yuasa%kurims.kurims.kyoto-u.junet@relay.cs.net>
cc: "X3J13: Character Subcommittee" <cl-characters@sail.stanford.edu>
Message-ID: <871221.101340.baggins@IBM.com>
Subject: Network communications

  The ANSI subcommittee handling character issues communicates
over the networks via a broadcast node (cl-characters) at Stanford.
You and/or the interested members of your committees are encouraged
to participate in these conversations.  If you inform me of the
appropriate net ids, I will have them added to the distribution
list.

Regards,
  Thom Linden

∂22-Dec-87  0600	CL-Characters-mailer 	Type hierarchy    
Received: from XEROX.COM by SAIL.STANFORD.EDU with TCP; 22 Dec 87  06:00:07 PST
Received: from Cabernet.ms by ArpaGateway.ms ; 22 DEC 87 06:01:12 PST
Date: 22 Dec 87 05:59 PST
From: Masinter.pa@Xerox.COM
Subject: Type hierarchy
In-reply-to: Thom Linden <baggins@ibm.com>'s message of Thu, 17 Dec 87 17:46:49
 PST
To: cl-characters@sail.stanford.edu
Message-ID: <871222-060112-6764@Xerox>

I've spent some time thinking about this:

I think it is a fundamental error, an unacceptable incompatible change, to
change the Common Lisp type STRING to be something other than (VECTOR
STRING-CHAR), as is suggested by all of the extant proposals.

I think one of our fundamental design goals is that the extended language
features being proposed be in fact extensions, in that current CL functions not
be in error.

Currently, you can assume after (TYPEP x 'STRING) that X can hold any
STRING-CHAR element. Allowing STRING to denote several different types of vector
whose element types are < STRING-CHAR would violate that assumption.

It isn't necessary to change STRING in an incompatible way, however. What is
really the intent of these proposals is to extend the various functions in CL
that currently take "STRING" to also allow them to take other types as well.

Suppose we define a new type

(defun character-vector-p (x) 
   (and (vectorp x) (subtypep (array-element-type x) 'string-char)))

(deftype character-vector () '(satisfies character-vector-p))..

Now extend all functions that take strings as input arguments and have them
accept any kind of character-vector. 

∂29-Dec-87  1449	CL-Characters-mailer 	Type hierarchy    
Received: from SCRC-RIVERSIDE.ARPA by SAIL.STANFORD.EDU with TCP; 29 Dec 87  14:49:26 PST
Received: from LM1.NSC.DIALNET.SYMBOLICS.COM by Riverside.SCRC.Symbolics.COM via DIAL with SMTP id 214218; 29 Dec 87 12:52:39 EST
Received: from LM2.NSC.Dialnet.Symbolics.COM by LM1.NSC.Dialnet.Symbolics.COM via CHAOS with CHAOS-MAIL id 19079; Tue 29-Dec-87 23:37:20 JST
Date: Tue, 29 Dec 87 23:37 JST
From: Carl Hoffman <CWH@LM1.NSC.Dialnet.Symbolics.COM>
Subject: Type hierarchy
To: Masinter.pa@Xerox.COM, CL-Characters@SAIL.Stanford.EDU
cc: Shiota@LM1.NSC.Dialnet.Symbolics.COM
In-Reply-To: <871222-060112-6764@Xerox>
Message-ID: <871229233714.3.CWH@LM2.NSC.Dialnet.Symbolics.COM>

    Date: 22 Dec 87 05:59 PST
    From: Masinter.pa@Xerox.COM

    I think it is a fundamental error, an unacceptable incompatible change, to
    change the Common Lisp type STRING to be something other than (VECTOR
    STRING-CHAR), as is suggested by all of the extant proposals.

Why do you feel that this is a fundamental error?  In the Symbolics Genera 7.1
implementation, the type STRING is the same as (OR (VECTOR STRING-CHAR) (VECTOR
CHARACTER)).  As far as I can tell, this hasn't caused a major compatibility
problem.  The CL programs I've seen which use strings have all run in the
Symbolics implementation without modification.

The Symbolics implementation returns the following results:

(TYPEP (MAKE-ARRAY 1 :ELEMENT-TYPE 'CHARACTER) '(VECTOR STRING-CHAR)) -> NIL
(TYPEP (MAKE-ARRAY 1 :ELEMENT-TYPE 'STRING-CHAR) '(VECTOR CHARACTER)) -> NIL
(TYPEP (MAKE-ARRAY 1 :ELEMENT-TYPE 'CHARACTER) 'STRING)               -> T
(STRINGP (MAKE-ARRAY 1 :ELEMENT-TYPE 'CHARACTER))                     -> T
(TYPEP (MAKE-ARRAY 1 :ELEMENT-TYPE 'STRING-CHAR) 'STRING)             -> T
(STRINGP (MAKE-ARRAY 1 :ELEMENT-TYPE 'STRING-CHAR))                   -> T

MAKE-ARRAY ELEMENT-TYPE 'STRING-CHAR returns an array which allocates 8 bits
per character.  MAKE-ARRAY ELEMENT-TYPE 'CHARACTER returns an array which
allocates 28 bits per character (16 bits of code, 8 bits of font, and 4 bits of
modifier).

I believe that the current plan is to change MAKE-ARRAY ELEMENT-TYPE
'STRING-CHAR to return an array which allocates 16 bits per character (for 16
bits of code) and to use MAKE-ARRAY ELEMENT-TYPE 'STANDARD-CHAR to do what is
currently done with MAKE-ARRAY ELEMENT-TYPE 'STRING-CHAR.

Incidentally, I haven't heard any discussion of Moon's proposal that we simply
use the type STANDARD-CHAR to mean "lowest overhead character storage class"
rather than introducing a new type THIN-CHAR or INTERNAL-THIN-CHAR.

    Currently, you can assume after (TYPEP x 'STRING) that X can hold any
    STRING-CHAR element. Allowing STRING to denote several different types of vector
    whose element types are < STRING-CHAR would violate that assumption.

Why not just declare that assumption obsolete, and replace it with the
assumption that if (TYPEP X '(VECTOR STRING-CHAR)) then X can hold any
STRING-CHAR element.  Can you give me some examples of code which make use of
your assumption?

    It isn't necessary to change STRING in an incompatible way, however. What is
    really the intent of these proposals is to extend the various functions in CL
    that currently take "STRING" to also allow them to take other types as well.

That is only part of the intent.  It is also important that the following
forms return T.  (Assume that # represents a Japanese character.)

  (STRINGP "#")
  (TYPEP "#" 'STRING)
  (TYPEP (CHAR "#" 0) 'STRING-CHAR)

If the above forms do not return T, then many CL programs originally written to
handle only standard characters will not work when running in an environment
which has Japanese characters.  A major goal of this proposal is to allow these
programs to run without modification.  I can show you many programs which
require that the above forms return T.

    Suppose we define a new type

    (defun character-vector-p (x) 
       (and (vectorp x) (subtypep (array-element-type x) 'string-char)))

    (deftype character-vector () '(satisfies character-vector-p))..

    Now extend all functions that take strings as input arguments and have them
    accept any kind of character-vector. 

If you replace STRING-CHAR in your example with CHARACTER, then this is exactly
the same as what Symbolics has already done with the STRINGP function and the
STRING data type.


∂06-Jan-88  2217	CL-Characters-mailer 	Re: Type hierarchy
Received: from XEROX.COM by SAIL.STANFORD.EDU with TCP; 6 Jan 88  22:17:01 PST
Received: from Cabernet.ms by ArpaGateway.ms ; 06 JAN 88 22:17:43 PST
Date: 6 Jan 88 22:16 PST
From: Masinter.pa@Xerox.COM
Subject: Re: Type hierarchy
In-reply-to: Carl Hoffman <CWH@LM1.NSC.Dialnet.Symbolics.COM>'s message of Tue,
 29 Dec 87 23:37 JST
To: CWH@LM1.NSC.Dialnet.Symbolics.COM
cc: Masinter.pa@Xerox.COM, CL-Characters@SAIL.Stanford.EDU,
 Shiota@LM1.NSC.Dialnet.Symbolics.COM
Message-ID: <880106-221743-6432@Xerox>

I've composed several replies and not sent them. My time is getting tight so I
have to send something. The problem is, can you have something that is a string
for which it is illegal to store a string-char into it?  No, in SCL. But if you
allow (vector standard-char) to also be a subtype of string, then you can have
vectors that can only hold standard-char and not string-char.

However, on even further reflection, there are many "read-only" strings, e.g.,
strings as program constants, for which it is an error to store *anything*. 

If we remove char-bits and char-font, we can get rid of the distinction between
string-char and character. This would be an improvement.

Most of the stuff in CLtL about the string type can in fact simply be removed,
while simplifying the language.

∂07-Jan-88  2036	CL-Characters-mailer 	X3J13 meeting in March 
Received: from IBM.COM by SAIL.STANFORD.EDU with TCP; 7 Jan 88  20:36:20 PST
Date: Wed, 06 Jan 88 09:58:06 PST
From: Thom Linden <baggins@ibm.com>
To: "X3J13: Character Subcommittee" <cl-characters@sail.stanford.edu>
Message-ID: <880106.095806.baggins@IBM.com>
Subject: X3J13 meeting in March

  I have arranged for our subcommittee to meet at the IBM Almaden
Research Centre on 14,15,18 March.  Please let me know if this
poses any difficulties.  Also, please let me know if your travel
arrangements or other commitments prevent your attending all or
part.

  ARC is south of Palo Alto, roughly a 40 to 50 min commute.
I would suggest our meetings begin at 10am to allow missing most
of the morning freeway congestion.  I'll provide more detailed
directions later.

Regards,
  Thom

∂07-Jan-88  2036	CL-Characters-mailer 	subcommittee mailing list   
Received: from IBM.COM by SAIL.STANFORD.EDU with TCP; 7 Jan 88  20:36:05 PST
Date: Tue, 05 Jan 88 21:42:40 PST
From: Thom Linden <baggins@ibm.com>
To: "Richard P. Gabriel" <rpg@sail.stanford.edu>
cc: "Dr. Takayasu Ito" <ito%ito.aoba.tohoku.junet@relay.cs.net>,
    "X3J13: Character Subcommittee" <cl-characters@sail.stanford.edu>
Message-ID: <880105.214240.baggins@IBM.com>
Subject: subcommittee mailing list

Dick,
  Please add the following individuals to the character subcommittee
mailing list:

  Yuasa:     yuasa@tutics.tut.junet
  Umemura:   umemura@nuesun.NTT.junet
  Kurokawa:  KUROKAWA%jpntscvm.bitnet%wiscvm.wisc.edu
  Yasumura:  yasumura@harl86.harl.hitachi.junet

Regards,
  Thom

∂07-Jan-88  2036	CL-Characters-mailer 	Comments on IBM Proposal from Dave Unitas (LUCID)    
Received: from IBM.COM by SAIL.STANFORD.EDU with TCP; 7 Jan 88  20:36:41 PST
Date: Wed, 06 Jan 88 12:50:38 PST
From: Thom Linden <baggins@ibm.com>
To: "X3J13: Character Subcommittee" <cl-characters@sail.stanford.edu>
Message-ID: <880106.125038.baggins@IBM.com>
Subject: Comments on IBM Proposal from Dave Unitas (LUCID)

  I have attached some comments on the proposal compiled by
Dave Unitas at LUCID.

  Both A and C seem to be good suggestions.


--------------------------------------------------------------------------------

A. Each character set is identified by its Character Set Name, a symbol,
   and an associated Character Set Number, a positive integer. (Replace
   CSID by Character Set Name or Character Set Number throughout the
   document).

   Replace char-split and char-join with:

     char-code-point char-code

   takes a character code and returns the component code-point.

     char-code-set char-code

   takes a character code and returns the component character set.

     make-char-code code-point &optional (character-set 0)

   takes a code-point and an optional character set and returns the
   character code.  The character set may be specified either as a
   Character Set Name of Character Set Number.


   Rename define-char-set to be define-character-set.  Make the arguments
   keywords rather than positionals.  If character-set-number is not
   specified, it is assigned from an available character set number
   below character-set-limit.

   Note:  Lucid as a whole is as yet undecided about whether user-
   defined character sets are generally useful enough to need to be
   included in the language.

B. We are still unsure about whether the type system should be extended
   to include extended strings of a particular character set or sets.

C. When printing an exted character set to a stream which only accepts
   base characters, it is printed in the form

   #\name:xxxx

   where name identifies the character set of the character, and xxxx
   is the code-point of the character in hex.  Strings containing
   extended characters are printed in the following form when written
   to a base-character only stream:

   #( char0 char1 char2 ...)

   with charn as above, following the standard Common Lisp vector
   printing convention.

∂10-Jan-88  0010	CL-Characters-mailer 	Re: Comments on IBM Proposal from Dave Unitas (LUCID)
Received: from Xerox.COM by SAIL.Stanford.EDU with TCP; 10 Jan 88  00:10:01 PST
Received: from Cabernet.ms by ArpaGateway.ms ; 10 JAN 88 00:10:39 PST
Date: 10 Jan 88 00:09 PST
From: Masinter.pa@Xerox.COM
Subject: Re: Comments on IBM Proposal from Dave Unitas (LUCID)
In-reply-to: Thom Linden <baggins@ibm.com>'s message of Wed, 06 Jan 88 12:50:38
 PST
To: baggins@ibm.com
cc: cl-characters@sail.stanford.edu
Message-ID: <880110-001039-3219@Xerox>

I think one of the problems with the discussion so far is that we've not agreed
really on the fundamental issue of whether the standard is for an optional
extension or for a required part of the standard.

For the record, I think that we should be designing things that are a required
part of every Common Lisp implementation. That is, every function, variable,
etc. in our standard should be in every Common Lisp implementation.  In some
implementations, the characters they work on are of course only 7 or 8-bit
ASCII, but all of the functions are there, and if the implementation has more
characters or Japanese characters, the same code will work.

If this is a required part of Common Lisp, we should try to keep to a minimum
the number of new functions, variables, and behaviors we expect from a Common
Lisp implementation. 

I don't think that the introduction of new functions and variables for dealing
with character sets really fits that criteria. The only situation where allowing
exposure to multiple character sets within a single implementation makes sense
is one in which the host operating system does not contain facilities to do
character set translation, and yet the programmer is unwilling (using binary
read-byte write-byte) to do that character set translation directly. This seems
like an extremely narrow application domain for the dozens of functions and
variables which exist in the IBM proposal.

= = = = 

As a side note, the IBM proposal contains a fairly serious design flaw: the
Common Lisp design is generally careful to avoid having dynamically modifiable
global state that isn't rebindable; e.g., although you can change macro
characters, all changes happen to *readtable*, etc.  Yet the character code
equivalency tables in the IBM proposal are global and not yet bindable. Even if
this isn't part of the standard but an internal library for you, you should fix
it.

= = = = = =
About the type system: the discussion on Common-LIsp@sail.stanford.edu on array
element type upgrading is relevant to the type hierarchy here. Suppose arrays
remember their element type. Redefine (stringp x) = (and (vectorp x) (subtypep
(array-element-type x) 'character)). 

If you want to make a string that consists of only (capital) vowels,, you can
say
(make-array 10 :element-type '(member #\A #\E #\I #\O #\U)).



= = = = = = =
Re: "C. When printing an exted character set to a stream which only accepts
   base characters, it is printed in the form ... Strings containing
   extended characters are printed in the following form when written
   to a base-character only stream ..."

how are symbols that contain extended characters printed?

What happens when you call PRINC (which is supposed to not include the #\)?

I think this is a bad design. If you want to write extended characters on a base
stream, you should design a character-by-character encoding with escape
characters, and have the write-char primitive for the base stream turn the
extended characters (and the excape) into an escaped character sequence. These
alternative print sequences only handle a small percentage of the situations.





∂22-Jan-88  0005	CL-Characters-mailer 	Equivalence binding    
Received: from IBM.COM by SAIL.Stanford.EDU with TCP; 22 Jan 88  00:05:45 PST
Date: Thu, 21 Jan 88 23:56:14 PST
From: Thom Linden <baggins@ibm.com>
To: "X3J13: Character Subcommittee" <cl-characters@sail.stanford.edu>
Message-ID: <880121.235614.baggins@IBM.com>
Subject: Equivalence binding

Larry's comment on the binding of equivalency tables is well taken.
Our view of the expected usage of these tables plus trying to keep
the proposed changes to a minimum argued against bindable tables.
Language consistency argues the other way.  The introduction of
an equivalencetable object and associated global *equivalencetable*
variable would make this more in line with the 'spirit' of CL.

∂22-Jan-88  0137	CL-Characters-mailer 	redefining STANDARD-CHAR    
Received: from IBM.COM by SAIL.Stanford.EDU with TCP; 22 Jan 88  01:37:17 PST
Date: Fri, 22 Jan 88 01:31:08 PST
From: Thom Linden <baggins@ibm.com>
To: "X3J13: Character Subcommittee" <cl-characters@sail.stanford.edu>
Message-ID: <880122.013108.baggins@IBM.com>
Subject: redefining STANDARD-CHAR

  Carl's comment on STANDARD-CHAR == lowest overhead character
storage class is precisely what 'base-character' was defined to
be.  The rational for STANDARD-CHAR being the small set of 96
glyphs is based on portability. Programs constrained to the
limited set are likely to be portable across a larger range of
systems and architectures.  While this is probably true (can
anyone testify to this?), it may not warrant a unique type.

  Other languages typically define a set of 'standard' characters
used for the construction of programs.  Does anyone know of a language
other than Lisp which equates this set with a unique type?

  I think distinguishing this 'lowest overhead storage class'
type is essential.  This must be made for efficiency reasons.
It's unacceptable to force the use of 16bit cells for all
characters in multi-lingual environments.

∂22-Jan-88  0202	CL-Characters-mailer 	Type Hierarchies  
Received: from IBM.COM by SAIL.Stanford.EDU with TCP; 22 Jan 88  02:01:50 PST
Date: Fri, 22 Jan 88 01:52:46 PST
From: Thom Linden <baggins@ibm.com>
To: "X3J13: Character Subcommittee" <cl-characters@sail.stanford.edu>
Message-ID: <880122.015246.baggins@IBM.com>
Subject: Type Hierarchies

No one has mentioned Bob Kern's document.  The type hierarchies
in the JEIDA, IBM and Kern documents are essentially identical
(excepting thin vs. base, fat vs. extended, and Bob's user-extensions).

  Bob makes a valid point that the two-byte encodings may make way
for three, etc. later.  But, it seems best to hide that from the
language as much as possible.  I suggest that extended would always
mean the 'largest overhead character storage class'.

∂22-Jan-88  0224	CL-Characters-mailer 	Font    
Received: from IBM.COM by SAIL.Stanford.EDU with TCP; 22 Jan 88  02:24:34 PST
Date: Fri, 22 Jan 88 02:20:47 PST
From: Thom Linden <baggins@ibm.com>
To: "X3J13: Character Subcommittee" <cl-characters@sail.stanford.edu>
Message-ID: <880122.022047.baggins@IBM.com>
Subject: Font

  Bob Kerns paper contains a set of changes to eliminate char-font
and allows for some migratory behavior.  I think the [migration]
aids not be made part of the standard but be suggestions as bridges
an implementation may provide.  I would like to get a straw vote
over the network as to everyone else's opinion?

   In summary:  (I, not Bob, marked items [migration])


      13.1 Character Attributes

          {eliminate references to font}


[migration]   char-font-limit
                   The value of char-font-limit is 1, unless the
                   implementation implements the obsolete char-font
                   feature.

      13.2 Predicates on Characters

          {eliminate references to font}

      13.3 Character Construction and Selection

          {eliminate references to font}

[migration]   char-font
                   This function is obsolete, and returns 0 for
                   compatibility.

[migration]   make-char char &optional (bits o) (font o)
                   (font o) exists for compatibility.

      13.3 Character Construction and Selection

          {eliminate references to font}

[migration]   digit-char weight &optional (radix 10) (font o)
                   (font o) exists for compatibility.


∂22-Jan-88  0234	CL-Characters-mailer 	character set predicates    
Received: from IBM.COM by SAIL.Stanford.EDU with TCP; 22 Jan 88  02:34:12 PST
Date: Fri, 22 Jan 88 02:28:18 PST
From: Thom Linden <baggins@ibm.com>
To: "X3J13: Character Subcommittee" <cl-characters@sail.stanford.edu>
Message-ID: <880122.022818.baggins@IBM.com>
Subject: character set predicates

Larry suggested to me that we not try to invent
correct set of xxx-char-p's
eg. kanji-char-p, hiragana-char-p, greek-char-p  ..  etc. but
instead use the names listed in the ISO std character sets.
This sounds like a good idea  ..  now we only have to find the
list.  In fact, I imagine we can reference the ISO std without
having to incorporate the list into ANSI.


∂26-Jan-88  1928	CL-Characters-mailer 	Font    
Received: from REAGAN.AI.MIT.EDU by SAIL.Stanford.EDU with TCP; 26 Jan 88  19:28:16 PST
Received: from JONES.AI.MIT.EDU by REAGAN.AI.MIT.EDU via CHAOS with CHAOS-MAIL id 88658; Tue 26-Jan-88 22:28:00 EST
Date: Tue, 26 Jan 88 22:27 EST
From: Robert W. Kerns <RWK@AI.AI.MIT.EDU>
Subject: Font
To: baggins@ibm.com
cc: cl-characters@sail.stanford.edu
In-Reply-To: <880122.022047.baggins@IBM.com>
Message-ID: <880126222758.5.RWK@JONES.AI.MIT.EDU>

    Date: Fri, 22 Jan 88 02:20:47 PST
    From: Thom Linden <baggins@ibm.com>

      Bob Kerns paper contains a set of changes to eliminate char-font
    and allows for some migratory behavior.  I think the [migration]
    aids not be made part of the standard but be suggestions as bridges
    an implementation may provide.  I would like to get a straw vote
    over the network as to everyone else's opinion?

This seems reasonable to me.  So far as anyone can tell, nobody
has ever implemented the Font field.  (I haven't checked with
Coral Software to see what they do on the Macintosh; that would seem
to me to be the place most likely to have done so.  I'll check with
them shortly.)

∂26-Jan-88  1942	CL-Characters-mailer 	Type Hierarchies  
Received: from REAGAN.AI.MIT.EDU by SAIL.Stanford.EDU with TCP; 26 Jan 88  19:42:10 PST
Received: from JONES.AI.MIT.EDU by REAGAN.AI.MIT.EDU via CHAOS with CHAOS-MAIL id 88661; Tue 26-Jan-88 22:42:01 EST
Date: Tue, 26 Jan 88 22:41 EST
From: Robert W. Kerns <RWK@AI.AI.MIT.EDU>
Subject: Type Hierarchies
To: baggins@ibm.com
cc: cl-characters@sail.stanford.edu
In-Reply-To: <880122.015246.baggins@IBM.com>
Message-ID: <880126224159.6.RWK@JONES.AI.MIT.EDU>

    Date: Fri, 22 Jan 88 01:52:46 PST
    From: Thom Linden <baggins@ibm.com>

    No one has mentioned Bob Kern's document.  The type hierarchies
    in the JEIDA, IBM and Kern documents are essentially identical
    (excepting thin vs. base, fat vs. extended, and Bob's user-extensions).

      Bob makes a valid point that the two-byte encodings may make way
    for three, etc. later.  But, it seems best to hide that from the
    language as much as possible.  I suggest that extended would always
    mean the 'largest overhead character storage class'.

The issue here is:  What should existing code, written using STRING and
STRING-CHAR mean?  Should code written in the most general current
fashion continue to mean the most general thing?  Or should it mean the
most efficient?

The assumption behind my proposal is that it should mean the most general,
and if you want a more specific, but more space-efficient, type, you use
a new name.

So far as Symbolics is concerned, having STRING-CHAR mean a more specific
type would be LESS of a problem, since in the current Symbolics software,
STRING-CHAR means the 1-byte kind of characters.

The trade-off, in terms of users' code, would be:

1)  If STRING-CHAR is more general, users' code will get less efficient when
an implementation implements the new standard, but will work for all input.

2)  If STRING-CHAR is more specific, users' code will retain their efficiency,
but may no longer work for the entire range of input found, say, in files or
other strings.

Whether case 2 would be viewed as an incompatibility or not depends on the
exact contract for the code in question.  For example, a file copy or string
utility would definitely be regarded as having been broken by the change,
while other code might be regarded as just not taking advantage of a new feature.

By the way, I should make my position in this clear.  I am no longer
affiliated with Symbolics.  While my opinions and views are probably
indicative of views there, and I have some influence and many contacts
there, I have no official connection, and my views are my own.  I
continue to be concerned with conventional as well as specialized
architectures, though.

∂04-Feb-88  0020	CL-Characters-mailer 	Forwarding note from Ito-san
Received: from IBM.COM by SAIL.Stanford.EDU with TCP; 4 Feb 88  00:19:52 PST
Date: Wed, 03 Feb 88 10:49:00 PST
From: Thom Linden <baggins@ibm.com>
To: "X3J13: Character Subcommittee" <cl-characters@sail.stanford.edu>
Message-ID: <880203.104900.baggins@IBM.com>
Subject: Forwarding note from Ito-san



-------------------------------------------------------------


To: Thom Linden, Chairman of character subcommittee, Common Lisp, ANSI
From: Takayasu Ito, Chairman of Japanese SC22/Lisp WG and JEIDA
      committee on Lisp standardization

Subject: Comments on IBM Proposal "Common LISP - Proposed Extensions
        for International Character Set Handling" (Version 01.11.87)

We have received the proposal through Mr. T. Kurokawa, P-member of our
committee. Here is the summary of our comments compiled by him and
Dr. T. Yuasa. (More details may be obtained from them.)

1. Overall impression

We think this is an interesting proposal for initiating extensive
investigation about international character set handling. We need,
however, to continue to work on many aspects on this area.

2. We have had several meetings on this subject. The following is a list
of comments presented at these occasions.

Please notify that these are not yet our committee's formal statement.
-- The locality of 'equivalence class' must be maintained as suggested
by Larry Masinter. A variable such as *equivalence-class* would do.
-- It is important to define the 'base' character set.
It is still under hot dispute, but one argues, for example, that
the base should be clearly defined as single-byte, and the extension
should be defined by each national standardization body.
Another says, in Japan, it
should be two-byte size or its maximum should be around 64K. We should
leave the actual implementation of the character to be
implementation-dependent so that US or Europe can enjoy the efficient
implementation of single byte size.
-- The implementation of 'equivalence class table' will be the key for
the efficiency.
-- The relationship between 'equivalence class' and 'readtable' or
'character macro' should be investigated further. We may be able to
reduce the primitives around these character input facilities.
-- The proposal may not well be abstracted. For those who have enough
experience on Common Lisp implementation, the document has so much
reflected from real (perhaps trial) implementation.
For example, csid for base is defined as '0' for US implementation.

3. Cooperation should be continued.
We regard that our cooperation for international character set handling
is indispensable and fruitful. We would like to continue to exchange our
ideas on this subject.


∂04-Feb-88  1732	CL-Characters-mailer 	Re: X3J13 meeting in March  
Received: from Xerox.COM by SAIL.Stanford.EDU with TCP; 4 Feb 88  17:32:05 PST
Received: from Cabernet.ms by ArpaGateway.ms ; 04 FEB 88 17:32:11 PST
Date: 4 Feb 88 17:32 PST
From: Masinter.pa@Xerox.COM
Subject: Re: X3J13 meeting in March
In-reply-to: Thom Linden <baggins@ibm.com>'s message of Wed, 06 Jan 88 09:58:06
 PST
To: baggins@ibm.com
cc: cl-characters@sail.stanford.edu
Message-ID: <880204-173211-1471@Xerox>

Thom:

For all of those who are staying in Palo Alto for the duration of the meeting,
adding the 40-50 minute commute each way (for a total of 4.5 hours of commute
time) seems to be a considerable imposition.  

It would seem to pose much fewer difficulties for almost all of the subcommittee
members to hold the meetings in Palo Alto, since that is where the X3J13 meeting
is being held. 

Jan Zubkoff has offered to arrange meeting rooms in Palo Alto for subcommittee
meetings; why not take her up on the offer?

I've been on the road and just returned; I'm sorry for my late reply to this
message. 
It is likely that the cleanup committee will meet on Tuesday morning 15 March,
which would interfere with my attending a meeting in Palo Alto until 1 PM and at
ARC until 2 PM.


∂08-Feb-88  1143	CL-Characters-mailer 	subcommittee meeting   
Received: from IBM.COM by SAIL.Stanford.EDU with TCP; 8 Feb 88  11:39:33 PST
Date: Mon, 08 Feb 88 11:22:20 PST
From: Thom Linden <baggins@ibm.com>
To: "X3J13: Character Subcommittee" <cl-characters@sail.stanford.edu>
Message-ID: <880208.112220.baggins@IBM.com>
Subject: subcommittee meeting

Larry has expressed interest in holding the subcommittee meetings
in Palo Alto to ease the commute.  What is the feeling of the
rest of the committee?  Please answer the following short
questionaire:

   I am planning on attending the March meeting:  YES/NO
   Subcommittee meeting at Almaden (San Jose) is OK:  YES/NO/DONTCARE
   I will be available to attend subcommittee meetings from:

                            Date               Hours

                          14 Mar               9-4pm
                          15 Mar               9-4pm
                          18 Mar               9-4pm

Please respond by 11 Feb so I can make alternate arrangements if
necessary.

Regards,
  Thom

∂08-Feb-88  1818	CL-Characters-mailer 	subcommittee meeting   
Received: from AI.AI.MIT.EDU by SAIL.Stanford.EDU with TCP; 8 Feb 88  18:17:51 PST
Date: Mon,  8 Feb 88 21:18:15 EST
From: "Robert W. Kerns" <RWK@AI.AI.MIT.EDU>
Subject:  subcommittee meeting
To: baggins@IBM.COM
cc: cl-characters@SAIL.STANFORD.EDU
In-reply-to: Msg of Mon 08 Feb 88 11:22:20 PST from Thom Linden <baggins at ibm.com>
Message-ID: <323585.880208.RWK@AI.AI.MIT.EDU>

    Date: Mon, 08 Feb 88 11:22:20 PST
    From: Thom Linden <baggins at ibm.com>
    To:   X3J13: Character Subcommittee <cl-characters at sail.stanford.edu>
    Re:   subcommittee meeting

    Larry has expressed interest in holding the subcommittee meetings
    in Palo Alto to ease the commute.  What is the feeling of the
    rest of the committee?  Please answer the following short
    questionaire:

       I am planning on attending the March meeting:  YES/NO
YES
       Subcommittee meeting at Almaden (San Jose) is OK:  YES/NO/DONTCARE
I would prefer Palo Alto.  I can handle San Jose; I do have
friends there I intend to visit, but Palo Alto would leave me
more flexibility.
       I will be available to attend subcommittee meetings from:

                                Date               Hours

                              14 Mar               9-4pm
                              15 Mar               9-4pm
                              18 Mar               9-4pm
Yes, so far as I know, but please, let's not consume 100%
of all three days!  I'll be suspicious of any work we do at that pace.

    Please respond by 11 Feb so I can make alternate arrangements if
    necessary.

    Regards,
      Thom

∂12-Feb-88  0620	CL-Characters-mailer 	Font    
Received: from XX.LCS.MIT.EDU by SAIL.Stanford.EDU with TCP; 12 Feb 88  06:20:19 PST
Received: from LIVE-OAK.LCS.MIT.EDU by XX.LCS.MIT.EDU via Chaosnet; 12 Feb 88 09:17-EST
Received: from ACORN.Gold-Hill.DialNet.Symbolics.COM by MIT-LIVE-OAK.DialNet.Symbolics.COM via DIAL with SMTP id 80237; 12 Feb 88 09:18:27-EST
Received: from BOSTON.Gold-Hill.DialNet.Symbolics.COM by ACORN.Gold-Hill.DialNet.Symbolics.COM via CHAOS with CHAOS-MAIL id 93908; Thu 11-Feb-88 05:30:59-EST
Date: Fri, 12 Feb 88 08:32 est
From: mike%acorn@oak.lcs.mit.edu
To: RWK@AI.AI.MIT.EDU
Subject: Font
Cc: baggins@ibm.com, cl-characters@sail.stanford.edu

          Bob Kerns paper contains a set of changes to eliminate char-font
        and allows for some migratory behavior.  I think the [migration]
        aids not be made part of the standard but be suggestions as bridges
        an implementation may provide.  I would like to get a straw vote
        over the network as to everyone else's opinion?
    
    This seems reasonable to me.  So far as anyone can tell, nobody
    has ever implemented the Font field.  


We don't implement font either. I think char-font should be dropped.
char-bits is more of a problem, but I think it should be dropped too.
for compatibility we should introduce a "non-standard" migration path
type called "keychord" to represent objects like #\c-m-s-h-S, etc.
The confusion between characters as codepoints in an implicit or 
explicit character set, and keyboard key combinations is one which is
incredibly useless and should go away. It is particularly troublesome
when you consider a japanese keyboard sequence, where you need 
several keyboard and keychord hits to generate a character, and
bits doesn't correspond to keychords or "shifting" in any reasonable
way.



...mike beckerle
Gold Hill

    


∂16-Feb-88  1112	CL-Characters-mailer 	March meeting
Received: from IBM.COM by SAIL.Stanford.EDU with TCP; 16 Feb 88  11:12:09 PST
Date: Tue, 16 Feb 88 10:04:31 PST
From: Thom Linden <baggins@ibm.com>
To: "X3J13: Character Subcommittee" <cl-characters@sail.stanford.edu>
Message-ID: <880216.100431.baggins@IBM.com>
Subject: March meeting

  In general, folks wanted to meet in the PA area.  I have requested
a meeting room for Monday 14 Mar 1-4pm and Tuesday 25 Mar 9-4pm.  I'll
relay the confirmation as soon as I have it.

Regards,
  Thom

∂16-Feb-88  1506	CL-Characters-mailer 	bits and charsets 
Received: from IBM.COM by SAIL.Stanford.EDU with TCP; 16 Feb 88  15:06:37 PST
Date: Tue, 16 Feb 88 12:39:18 PST
From: Thom Linden <baggins@ibm.com>
To: "X3J13: Character Subcommittee" <cl-characters@sail.stanford.edu>
Message-ID: <880216.123918.baggins@IBM.com>
Subject: bits and charsets

    We don't implement font either. I think char-font should be dropped.
    char-bits is more of a problem, but I think it should be dropped too.
    for compatibility we should introduce a "non-standard" migration path
    type called "keychord" to represent objects like #\c-m-s-h-S, etc.
    The confusion between characters as codepoints in an implicit or
    explicit character set, and keyboard key combinations is one which is
    incredibly useless and should go away. It is particularly troublesome
    when you consider a japanese keyboard sequence, where you need
    several keyboard and keychord hits to generate a character, and
    bits doesn't correspond to keychords or "shifting" in any reasonable
    way.


  In thinking about Mike's note, it occurs to me that explicit support
for character sets actually encompasses bits.  An implementation
could support a character set named 'meta-cyrillic' for example, this
could contain all the cyrillic character combinations of alt,ctl, etc..
and would be distinct from the non-distinguished cyrillic characters.
Similarily this could apply to any conventional character set an
implementation would choose to support.

∂16-Feb-88  1543	CL-Characters-mailer 	bits and charsets 
Received: from IBM.COM by SAIL.Stanford.EDU with TCP; 16 Feb 88  15:42:50 PST
Date: Tue, 16 Feb 88 12:39:18 PST
From: Thom Linden <baggins@ibm.com>
To: "X3J13: Character Subcommittee" <cl-characters@sail.stanford.edu>
Message-ID: <880216.123918.baggins@IBM.com>
Subject: bits and charsets

    We don't implement font either. I think char-font should be dropped.
    char-bits is more of a problem, but I think it should be dropped too.
    for compatibility we should introduce a "non-standard" migration path
    type called "keychord" to represent objects like #\c-m-s-h-S, etc.
    The confusion between characters as codepoints in an implicit or
    explicit character set, and keyboard key combinations is one which is
    incredibly useless and should go away. It is particularly troublesome
    when you consider a japanese keyboard sequence, where you need
    several keyboard and keychord hits to generate a character, and
    bits doesn't correspond to keychords or "shifting" in any reasonable
    way.


  In thinking about Mike's note, it occurs to me that explicit support
for character sets actually encompasses bits.  An implementation
could support a character set named 'meta-cyrillic' for example, this
could contain all the cyrillic character combinations of alt,ctl, etc..
and would be distinct from the non-distinguished cyrillic characters.
Similarily this could apply to any conventional character set an
implementation would choose to support.

∂19-Feb-88  1434	CL-Characters-mailer 	Re: bits and charsets  
Received: from Xerox.COM by SAIL.Stanford.EDU with TCP; 19 Feb 88  14:34:18 PST
Received: from Cabernet.ms by ArpaGateway.ms ; 19 FEB 88 14:24:22 PST
Date: 19 Feb 88 14:24 PST
From: Masinter.pa@Xerox.COM
Subject: Re: bits and charsets
In-reply-to: Thom Linden <baggins@ibm.com>'s message of Tue, 16 Feb 88 12:39:18
 PST
To: baggins@ibm.com
cc: cl-characters@sail.stanford.edu
Message-ID: <880219-142422-9739@Xerox>

Well, the most natural embedding of "bits" is just directly within the character
code space, with or without the character code equivalence space.

On the subject of character sets, I've thought of the following problem with any
kind of dynamic adjustment of character equivalence tables: hash tables which
hash by string-equal won't work if string-equal might depend either on some
dynamically changable state or even a bindable state.


∂29-Feb-88  1304	CL-Characters-mailer 	subcommittee meeting   
Received: from IBM.COM by SAIL.Stanford.EDU with TCP; 29 Feb 88  13:04:37 PST
Date: Mon, 29 Feb 88 12:29:52 PST
From: Thom Linden <baggins@ibm.com>
To: "X3J13: Character Subcommittee" <cl-characters@sail.stanford.edu>
Message-ID: <880229.122952.baggins@IBM.com>
Subject: subcommittee meeting

The characters subcommittee will meet from 9am-5pm on both
Monday, 14 Mar, and Tuesday, 15 Mar, in the Hyatt Delmonte room.

Regards,
  Thom

∂08-Mar-88  1417	CL-Characters-mailer 	Type Hierarchies  
Received: from XX.LCS.MIT.EDU by SAIL.Stanford.EDU with TCP; 8 Mar 88  14:16:28 PST
Received: from LIVE-OAK.LCS.MIT.EDU by XX.LCS.MIT.EDU via Chaosnet; 8 Mar 88 16:53-EST
Received: from ACORN.Gold-Hill.DialNet.Symbolics.COM by MIT-LIVE-OAK.DialNet.Symbolics.COM via DIAL with SMTP id 83562; 8 Mar 88 16:48:56-EST
Received: from BOSTON.Gold-Hill.DialNet.Symbolics.COM by ACORN.Gold-Hill.DialNet.Symbolics.COM via CHAOS with CHAOS-MAIL id 96137; Tue 8-Mar-88 14:56:37-EST
Date: Tue, 8 Mar 88 14:56 est
From: mike%acorn@oak.lcs.mit.edu (mike@gold-hill.com after 1-April-88)
COMMENTS: NOTE %acorn@oak... CHANGES TO @GOLD-HILL.COM ON 1-April-88
To: RWK@AI.AI.MIT.EDU
Subject: Type Hierarchies
Cc: baggins@ibm.com, cl-characters@sail.stanford.edu

        No one has mentioned Bob Kern's document.  The type hierarchies
        in the JEIDA, IBM and Kern documents are essentially identical
        (excepting thin vs. base, fat vs. extended, and Bob's user-extensions).
          Bob makes a valid point that the two-byte encodings may make way
        for three, etc. later.  But, it seems best to hide that from the
        language as much as possible.  I suggest that extended would always
        mean the 'largest overhead character storage class'.

In fact. My contacts in japan assure me that more than 16 bits are needed.
The japanese consider themselves to be the guardians of oriental
interests in these matters. Korean, Mandarin, etc. all require plenty
more than just 16 bits of code. Moreover, having just a two level
hierarchy (8 bit char codes, or "extended") is egocentric, and just
shouldn't be done. 

My suggestion is that characters and their types be extended like
the UNSIGNED-BYTE type:

(CHARACTER 8) (CHARACTER 16) (CHARACTER 24) (CHARACTER 32)

where (TYPEP X '(CHARACTER <n>)) means 
(AND (TYPEP X 'CHARACTER)
     (TYPEP (CHAR-CODE X) '(UNSIGNED-BYTE <n>))

    The issue here is:  What should existing code, written using STRING and
    STRING-CHAR mean?  Should code written in the most general current
    fashion continue to mean the most general thing?  Or should it mean the
    most efficient?

This should be solved the same way as for floating point numbers.
*READ-DEFAULT-FLOAT-FORMAT* determines the kind of floats the reader
creats and the printer prints by default.

*DEFAULT-CHARACTER-CODE-SIZE* (pick any name) can determine the 
default width.

This variable however, just affects the reader and printer and does
not authorize the compiler to do anything other than call generic
string operations.

    The assumption behind my proposal is that it should mean the most general,
    and if you want a more specific, but more space-efficient, type, you use
    a new name.

I think it should be up to the implementation what the default value of this
parameter is. It would be nonsense to have a common lisp that is primarily
sold in japan have the default be 8 bits, and similarly nonsense for
one sold in the US to have the default be 16 bits. Most applications do
not use strings of more than one kind, although clearly many will.

    So far as Symbolics is concerned, having STRING-CHAR mean a more specific
    type would be LESS of a problem, since in the current Symbolics software,
    STRING-CHAR means the 1-byte kind of characters.

STRING-CHAR, like CHARACTER as I've described it above, is a non-specific
type specifier, much like UNSIGNED-BYTE. Ultimately, I'd like to dump
CHAR-BITS in favor of a whole new concept which would be non-standard
generally, called KEY-CHORDS. CHAR-FONT is also out-the-window
as far as I'm concerned. Hence, I think there should be no
difference at all between STRING-CHAR and CHARACTER.

 
    The trade-off, in terms of users' code, would be:
    
    1)  If STRING-CHAR is more general, users' code will get less efficient when
    an implementation implements the new standard, but will work for all input.
    
Clearly, we need a global proclaimation that says all strings 
contain characters of a certain width, so that one can set the reader
default, give the proclaimation, then compile, with no loss of 
efficiency.  How about

(PROCLAIM '(CHAR-CODE-SIZE 8))


    2)  If STRING-CHAR is more specific, users' code will retain their efficiency,
    but may no longer work for the entire range of input found, say, in files or
    other strings.
    
    Whether case 2 would be viewed as an incompatibility or not depends on the
    exact contract for the code in question.  For example, a file copy or string
    utility would definitely be regarded as having been broken by the change,
    while other code might be regarded as just not taking advantage of a new feature.
    

    

...mike beckerle
Gold Hill

∂09-Mar-88  1230	CL-Characters-mailer 	subcommittee meeting   
Received: from IBM.COM by SAIL.Stanford.EDU with TCP; 9 Mar 88  12:29:55 PST
Date: Wed, 09 Mar 88 12:08:12 PST
From: Thom Linden <baggins@ibm.com>
To: "X3J13: Character Subcommittee" <cl-characters@sail.stanford.edu>
cc: Jan Zubkoff <edsel!jlz@labrea.stanford.edu>
Message-ID: <880309.120812.baggins@IBM.com>
Subject: subcommittee meeting

Our meeting room has been changed from Del Monte to the Regency-2
room.  Also, I have some difficulty with a 9am start and would
like to change this to 10am.

Regards,
  Thom

∂09-May-88  0743	CL-Characters-mailer 	back from travel  
Received: from IBM.COM by SAIL.Stanford.EDU with TCP; 9 May 88  07:42:49 PDT
Date: Mon, 09 May 88 07:41:46 PDT
From: Thom Linden <baggins@ibm.com>
To: "X3J13: Character Subcommittee" <cl-characters@sail.stanford.edu>
Message-ID: <880509.074146.baggins@IBM.com>
Subject: back from travel

  I'm back from several weeks in europe.  This week I plan to draft
the changes discussed at the last PA meeting.  Some notes from the
last meeting are also forthcomming.

Regards,
  Thom

∂11-May-88  0833	CL-Characters-mailer 	june meeting 
Received: from IBM.COM by SAIL.Stanford.EDU with TCP; 11 May 88  08:33:15 PDT
Date: Wed, 11 May 88 08:27:00 PDT
From: Thom Linden <baggins@ibm.com>
To: "X3J13: Character Subcommittee" <cl-characters@sail.stanford.edu>
Message-ID: <880511.082700.baggins@IBM.com>
Subject: june meeting

  The choices for a subcommittee meeting in June are 13, 14, 17.
I believe one day will be sufficient and have a high school graduation
the 17th.  So, from my end, June 14th is the reasonable selection.

  Please respond asap as to whether you can:

      1) attend the main x3j13 meeting (15,16 June)
      2) attend a 14 June subcommittee meeting
      3) prefer a different date(s) for the subcommittee meeting

Regards,
  Thom

∂16-May-88  1353	CL-Characters-mailer 	June meeting 
Received: from IBM.COM by SAIL.Stanford.EDU with TCP; 16 May 88  13:53:41 PDT
Date: Mon, 16 May 88 13:47:16 PDT
From: Thom Linden <baggins@ibm.com>
To: "X3J13: Character Subcommittee" <cl-characters@sail.stanford.edu>
Message-ID: <880516.134716.baggins@IBM.com>
Subject: June meeting

I had one request for an evening meeting on June 14 and no other
indicated preferences.  I'm working on a conf room arrangements
for that evening and will post a notice as soon as I have something
solid.
  I am planning on arriving on Monday evening, returning Wednesday pm
after the 1st day of x3j13.

Regards,
  Thom

∂16-May-88  1615	CL-Characters-mailer 	June meeting 
Received: from IBM.COM by SAIL.Stanford.EDU with TCP; 16 May 88  16:15:44 PDT
Date: Mon, 16 May 88 16:11:50 PDT
From: Thom Linden <baggins@ibm.com>
To: "X3J13: Character Subcommittee" <cl-characters@sail.stanford.edu>
Message-ID: <880516.161150.baggins@IBM.com>
Subject: June meeting

  An evening meeting room at Symbolics is apparently not possible.
Our meeting will now take place from 10-5 in the 'Bermuda' room (so bring
your tanning lotion).
I'm willing to also meet again at 7 in the hotel (at least to review) if
anyone has difficulty making the earlier time.

Regards,
  Thom
========================================================================
Received: from  STONY-BROOK.SCRC.Symbolics.COM by IBM.COM on 05/16/88 at 14:32:48 PDT
Received: from PEGASUS.SCRC.Symbolics.COM by STONY-BROOK.SCRC.Symbolics.COM via CHAOS with CHAOS-MAIL id 405455; Mon 16-May-88 17:10
Received: by scrc-pegasus id AA00374; Mon, 16 May 88 16:49:46 edt
Date: Mon, 16 May 88 16:49:46 edt
From: Rosemary Bouzane <bouzane@scrc-pegasus>
To: baggins@ibm.com
Subject: Re:  subcommittee meeting

Since Symbolics is a secured building we cannot accommodate your
request for Tuesday evening.  However, I switched meetings around
so that we could do the following:

   The Character Committee can now meet at Symbolics in
   our Bermuda Conference Room - first floor 10:00-5:00.

∂20-May-88  0053	CL-Characters-mailer 	june meeting 
Received: from AI.AI.MIT.EDU by SAIL.Stanford.EDU with TCP; 20 May 88  00:53:17 PDT
Date: Fri, 20 May 88 03:57:48 EDT
From: "Robert W. Kerns" <RWK@AI.AI.MIT.EDU>
Subject:  june meeting
To: baggins@IBM.COM
cc: cl-characters@SAIL.STANFORD.EDU
In-reply-to: Msg of Wed 11 May 88 08:27:00 PDT from Thom Linden <baggins at ibm.com>
Message-ID: <381653.880520.RWK@AI.AI.MIT.EDU>

    Date: Wed, 11 May 88 08:27:00 PDT
    From: Thom Linden <baggins at ibm.com>

      Please respond asap as to whether you can:
          1) attend the main x3j13 meeting (15,16 June)
Yes.
          2) attend a 14 June subcommittee meeting
Yes.
          3) prefer a different date(s) for the subcommittee meeting
I would prefer a date sometime in July for the whole mess, but...

I believe Mike Bekerly will also attend, but he's off the net and
doesn't know the dates.  (Gold Hill hopes to be back on the net soon,
but they've been off for quite a while, apparently).

    Regards,
      Thom

∂25-Jun-88  1536	CL-Characters-mailer 	character proposal
Received: from IBM.COM by SAIL.Stanford.EDU with TCP; 25 Jun 88  15:36:26 PDT
Date: Sat, 25 Jun 88 15:32:26 PDT
From: Thom Linden <baggins@ibm.com>
To: "X3J13: Character Subcommittee" <cl-characters@sail.stanford.edu>
Message-ID: <880625.153226.baggins@IBM.com>
Subject: character proposal

OK.  I have made (I think) the changes to the proposal as discussed
in Boston. There are still a couple of points which need further
discussion or documentation:

    Simple strings:  currently the document only specifies
          simple-base-string (simple-string is eliminated as
          ambiguous).

    External width:  I believe this is still needed and Dick
          Waters indicated some need for this type of function
          at the Boston meeting.  However this is still contested.

    Standard # Macro Character Syntax:  Is there a reasonable
          convention for 'named' extended characters.  Perhaps
          #\character-set:index.  For example #\JISxxx:234.

    ?? and probably others.


 At this point, I would like everyone to read the proposal
in depth.  There are two sections 1) the overview and 2)
the detail changes to CLtL.

Read the first section for completeness and accuracy.  Note it
doesn't have to cover every detail of change but needs to
say enough to understand the overall pattern of change.

For the second section, I suggest you mark up a fresh CLtL per the
proposal.  This will help verify the paragraph numbers!
Then review the entire CLtL for consistency, accuracy and completeness.
(It turned out characters hit quite a variety of places, some easy
to miss!).


  In all cases, please write up changes to the proposal in a
  complete manner.  I'm running out of time to type LaTex.
  If you are willing to completely rewrite some section,
  feel free to do so (ie. don't suggest I rewrite it).

  I used very few features of LaTex to create the document.  I
  expect they will be self explanatory.  Quiz me if not.

  We need to vote the document out of committee in sufficient
  time to distribute electronically before the next general
  meeting in October.  Therefore, I'm setting the first week
  in August as our final vote target.  In July, we need to
  discuss and vote on any sub-issues.  If there is strong
  opposition (or proposition) by any member on some aspect
  of the proposal, we'll bring it to a vote for settlement.

Gary Palter,  as Bob Kerns and Mike Beckerle seem to have
net access problems, I'm asking you to distribute copies
of the proposal to them. (Thanks in advance).  Let me know
if that poses any difficulties.


Any non-US colleagues listening into this discussion,
feel free to review the document as well!  Please note that
this is still a working document of the subcommittee.  Thus
your comments will probably have greater impact if given
now than later.

Regards,
  Thom

∂27-Jun-88  0747	CL-Characters-mailer 	character proposal
Received: from IBM.COM by SAIL.Stanford.EDU with TCP; 27 Jun 88  07:47:11 PDT
Date: Mon, 27 Jun 88 07:44:37 PDT
From: Thom Linden <baggins@ibm.com>
To: "X3J13: Character Subcommittee" <cl-characters@sail.stanford.edu>
Message-ID: <880627.074437.baggins@IBM.com>
Subject: character proposal

Well, I arrived this morning to find the proposal returned by
the postmaster as being too big.  I'll work on a circumvention
this morning.

Regards,
  Thom

∂27-Jun-88  0845	CL-Characters-mailer 	part 1  
Received: from IBM.COM by SAIL.Stanford.EDU with TCP; 27 Jun 88  08:44:27 PDT
Date: Mon, 27 Jun 88 08:38:01 PDT
From: Thom Linden <baggins@ibm.com>
To: "X3J13: Character Subcommittee" <cl-characters@sail.stanford.edu>
Message-ID: <880627.083801.baggins@IBM.com>
Subject: part 1

  Part one of the proposal is appended to this note.

Regards,
  Thom
-----------------------------------------------------------------------
\documentstyle{report}     % Specifies the document style.

\pagestyle{headings}

\title{\bf DRAFT DRAFT:
Extensions to Common LISP to Support International
Character Sets}
\author{
Michael Beckerle\thanks{Gold Hill} \and
Paul Beiser\thanks{Hewlett-Packard} \and
Carl Hoffman\thanks{ILA Associates} \and
Robert Kerns\thanks{Independent consultant} \and
Kevin Layer\thanks{Franz LISP} \and
Thom Linden\thanks{IBM Research, Subcommittee Chair} \and
Larry Masinter\thanks{XEROX Research} \and
etc
}
\date{June 24, 1988}   % Deleting this command produces today's date.

\begin{document}

\maketitle                 % Produces the title.

\setcounter{secnumdepth}{4}

\setcounter{tocdepth}{4}
\tableofcontents


%----------------------------------------------------------------------
%----------------------------------------------------------------------
\newfont{\cltxt}{cmr10}
\newfont{\clkwd}{cmtt10}

\newcommand{\apostrophe}{\clkwd '}
\newcommand{\bq}{\clkwd\symbol{'22}}

\newcommand{\editstart}{\begin{tabbing}
     12345 \= \kill                           %set tab1
     \bf$\Rightarrow$\ddag}
\newcommand{\editend}{\end{tabbing}}



%----------------------------------------------------------------------
%----------------------------------------------------------------------
\chapter{Introduction}

This is a proposal for both extending and modifying the Common LISP
language definition to provide a standard basis for Common LISP
support of the variety of character sets used to represent the
native languages of the international community.

This proposal was created by the Character Subcommittee of X3 J13.
We would like to acknowledge the JEIDA proposal \cite{ida87}
as well as the
proposals \cite{linden87} and \cite{kerns87} for
providing the initial motivation and direction for these extensions.
As all three documents \cite{ida87,linden87,kerns87} were created
expressly for Common LISP standardization usage,
we have borrowed freely from their ideas as well as the texts
themselves.

This document is separated into two parts. The first part explains the
major language changes and their motivations.  The second part,
Appendix A, provides
the page by page set of editorial changes to \cite{steele84}

\section{Objectives}

The major objectives of this proposal are:
\begin{itemize}
\item Providing a consistent, well-defined scheme allowing support
of both very large character sets and multiple character sets.

Many native
languages, such as Japanese and Chinese, use character
sets which contain more characters than the Roman alphabet.
Supporting larger sized character sets frequently means employing
larger data fields to uniquely encode each
character.
Common LISP implementations using
larger sized character sets
can
incur performance penalties in terms
of space, time, or both.

Many software applications are intended for international use, or
have requirements for incorporation of language elements of multiple
native
languages within a single application.
In order
to ensure some portability of these applications, data expressed in
a mixture of
native
languages must be treated consistently by the
software language.

\item To ensure efficient performance of string and character
operations.

The use of large and/or multiple character sets by an implementation
implies the need for more complex character type representation.  If
more complex character type representation is employed, the efficiency
of language operations on characters (e.g. string operations)
could be affected.

\item To assure forward compatibility of the proposed model
and definition with existing Common LISP implementations.

Developers should not be required to re-write large amounts of either
LISP code or data representations in order to apply the proposed
changes to existing implementations.
The proposed changes should provide an easy
portability path for existing code to many possible implementations.
\end{itemize}
%----------------------------------------------------------------------
%----------------------------------------------------------------------
%----------------------------------------------------------------------
%----------------------------------------------------------------------
\chapter{Overview}

We use several terms within this document which
may differ somewhat from
conventional usage.  Definitions for the following prominent
terms are provided for the reader's convenience.

A {\em character repertoire} defines a collection of characters
independent of their specific rendered image or font.  Character
repertoires are specified independent of coding and their characters
are only identified with a unique label, a graphic symbol, and
a character description.
Once defined, a character repertoire must be
{\em encoded} to allow a one-to-one mapping between a character
and a number that serves as the character code.  Once a repertoire
is encoded it is called a {\em coded character set}.
In Common LISP a {\em character} data object is identified by its
{\em character code}, a unique numerical code identification.
Each character code is composed from
a {\em character set identifier},
shared by all characters of a particular character
set, and a {\em character set index}, a numerical identification which
is unique within a particular character set.

Character data objects which are classified as {\em graphic},
or displayable, are each associated with a {\em glyph}.  The
glyph is the visual representation of the character.

%----------------------------------------------------------------------
\section{Character Identity}


Characters are uniquely distinguished by their codes,
which are drawn from the set of
non-negative integers.

It is important to separate the notion of glyph from the notion of
character data object when defining a scheme under which issues of
identity can be rigorously decided by a computer language.  Glyphs are
the visual aspects of characters, writable on surfaces, and sometimes
called 'graphics'.  A language specification valid for more than a
narrow range of systems can only make assumptions about the existence
of {\em abstract} glyphs (for example, the Latin letter A) and not about
glyph variants (for example, the italicized Latin letter {\em A})
\footnote{these later are often referred to as {\em designer} glyphs}
or characteristics of display devices.  Thus, a key element of this
proposal is the removal of the {\em font} and {\em bits}
attributes from the language specification.\footnote{These and other
attributes may still be supported by an implementation but they
are extensions which do not affect the identity of the character
object.}

Character codes are composed from a character set identifier and a
character set index.
Within a given character set, individual member
characters are distinguished by character set index.
\footnote{
We specifically do not propose any standard encoding for
any character repertoires.
}
An implementation need
not support more than one character set, the {\em base} character set.
If it does support multiple
character sets, it must define the sets supported and
their characteristics.  Character set identifiers are assigned to
character sets by the implementation.
\footnote{
We also do not propose any standard character set
identifiers but names such as {\clkwd :ISO8859-1988} come to mind.
}
Characters within the base character set are referred to as
{\em base characters}.  Characters not in the base character set
are referred to as {\em extended characters}.

One ramification is that the distinction between {\clkwd string-char}
and {\clkwd character} is eliminated.  {\bf All} characters can be
inserted into (type compatible) strings.
For compatibility, {\clkwd string-char}
is defined as equivalent to {\clkwd character}.  All functions
dealing with the {\em bits} and {\em font} attributes are either
removed or modified by this proposal.

A second ramification is that character codes now have two components,
and various character predicates must be modified to deal with them.
The convention by which the character set index
and character set identifier are composed into a single integer code
is implementation dependent.

A third ramification
is that the {\clkwd characterp} predicate is extended to
support testing
membership of a character in a given character repertoire
or subrepertoire.
\footnote{
For example,
testing membership in the Kanji subrepertoire.
}

The
intent of the provision for multiple character sets
is that
native
language glyph sets (with associated digits and
punctuation)
\footnote{For example, the glyphs on the keycaps of a particular
terminal, or any other glyph sets with a common use in graphics or
symbolic communication.
}
supported by user display
hardware should each be mapped by the I/O interface
into its own character set inside
LISP, all the members of which
share a common character set identifier.
\footnote{Of course, an implementation would be free to decide if and
how supported glyphs should be differentiated into sets.
}
Which glyph sets are supported by the overall computing system, the
details of the mapping of
glyphs to character set indices, and the particular character set
identifiers used, are left unspecified by Common LISP.

The diversity of glyph sets and character
encoding conventions in use worldwide and the desirability
of allowing LISP to manipulate symbolic elements from many
languages, perhaps simultaneously, mandate such a flexible approach.

%----------------------------------------------------------------------
\section{Hierarchy of Types}


A Common LISP
implementation is required to support at least one character
repertoire: the {\em base character repertoire}.
The base character repertoire
is distinguished from every other supported character repertoire in
several respects:
\begin{itemize}
\item
The standard characters are a subrepertoire of the base characters.
\item
Only members of the base character repertoire
can be elements of a base string.
\item
The base characters are, in general, the default characters for I/O
operations.
\end{itemize}
No upper bound is specified for the number of glyphs in the base
character repertoire--that
is implementation dependent.  The lower bound is 96, the
number of standard characters defined for Common LISP.
We use the term {\em extended} to describe character repertoires beyond
the base repertoire.

The following type specifier is added as a subtype
of {\clkwd character}.
\begin{itemize}
\item base-character
\end{itemize}

The distinction of a base character set is largely a pragmatic
choice.  It permits efficient handling of common situations, is
in some sense privileged for host system I/O, and can serve as an
intermediate basis for portability, less general than the standard
characters, but possibly more useful across a narrower range of
implementations.

Most computers have some "natural" character representation which
is a function of hardware instructions for dealing with characters,
as well as the organization of the file system.  The natural character
representation is likely to be the smallest transaction unit permitted
for text file and terminal I/O operations.  On a system with a record
based I/O paradigm, the natural character representation is likely to
be the smallest record quantum.  On many computer systems,
this representation is a byte.

However, there are often multiple character sets supportable on a
computer, through the use of special display and entry hardware, which
are varying interpretations of the basic system character
representation.  For example, EBCDIC and extended ASCII are two
different interpretations of the same 1-byte code representations.
Many countries have their own glyph-to-code mappings for 1-byte
character codes addressing the special requirements of national
languages.  Differentiating between these sets, without reference to
display hardware, is a matter of convention, since they all use the
same set of code representations.  When a single byte is not enough,
two or more bytes are sometimes used for character encoding.  This
makes character handling even more difficult on machines where the
natural representation size is a byte, since not only is the semantic
value of a character code a matter of convention, which may vary
within the same computing system, but so is the identification of a
set of bits as a complete character code.

It is the intention of this proposal that the base character set of
Common LISP
be the natural characters of the host system: its composition
should be
determined by the code capacity of the natural file system and I/O
transaction representations, and its assumed display glyphs should be
those of the terminals most commonly employed.
There are several advantages to this scheme.  Internal representation
of strings of just base characters can be more compact than
strings including extended characters.
Source programs are likely to consist predominantly of base characters
since the standard characters are a subset of the base character
repertoire. Parsing of pure base character text
can be more efficient than parsing of text including
extended characters.
I/O can be performed more simply
with base characters,
and they can be used as a basis for data representations to
be shared with other LISP sessions with potentially different
character set definitions or non-LISP processes.

{\em Implementation note}:
Although the readtable must be capable of
holding syntax information for all characters, the data
structure(s) used internally for the readtable may be segmented
into a section for each defined character set.  Access for
base character syntax during the parsing of base strings may
be quicker than the general case since the table section is the
same for all component characters, and entries may be accessed
with a single index by code point.


The standard characters are the 96 characters used in the Common LISP
definition {\bf or their equivalents}.

This was the Common LISP \cite{steele84} definition, but
{\em equivalents} is a vague term.

The standard characters are not defined by their glyphs, but by their
roles within the language.  There are two aspects to the roles of the
standard characters: one is their role in reader and format control
string syntax; the second is their role as components of the names of
all Common LISP
functions, macros, constants, and global variables.  As
long as an implementation chooses 96 characters
and treats those 96 in a manner consistent with
the language's specification for the standard characters (e.g.
the naming of functions), it doesn't matter what glyphs the I/O
hardware uses to represent those characters: they are the standard
characters.  Any program or
data text written wholly in those characters
is portable through simple code conversion.

A mechanism, such as in \cite{linden87}, which supports establishment of
equivalency between distinct characters is not excluded by
of this proposal.
\footnote{But, as with the font character attribute,
is not a mechanism standardized by the Common LISP definition.}
In general, the authors of this proposal favor the long
term solution of ISO standardization of non-overlapping
character repertoires.

The {\clkwd string} type
is defined as
a vector of characters.  More precisely, a string
is a specialized vector whose elements are of type
{\clkwd character} or a subtype of character.  There are three strings
distinguished with standardized names: {\em base-string},
{\em most-general-string}, and {\em simple-base-string}.

A base string can only contain base characters.  A
{\clkwd most-general-string}
can contain any implementation supported base or extended characters,
in any mixture.
All Common LISP functions defined to operate on strings operate
consistently on base strings and extended strings with the following
caveat: for any function which inserts a character into a string, it
is an error to insert an extended character
into a base string.

The {\clkwd coerce} function is extended to
allow for explicit coercion between base strings and extended strings.

During reader
construction of symbols, if all the characters
in the symbol's name are of type {\clkwd base-character},
then the name of the symbol will be stored as a base string.
Otherwise it will be stored as an extended string.

The base string type allows for more compact representation of strings
of base characters, which are likely to predominate in any system.
Note that in any particular implementation the base character set
need not be the
most compactly representable character set, since another might have
fewer code points.  However, in most implementations base strings are
likely to be more space efficient than extended strings.

It has been suggested that either a single string type is
sufficient for large character set Common LISP implementations,
or that a hierarchy of string types could be used, in a manner
transparent to the user.  A desire to flexibly support many different
character sets without compromising the efficiency of ordinary
applications led us to accept the need for more than one string type.
We believe that these choices reflect a minimal
modification of this aspect of the type system, and that
exposing the string types for user programs to negotiate in their own
way is the most reasonable approach.


%----------------------------------------------------------------------
\section{Streams and System I/O}

A lot of the work of ensuring that a
Common LISP implementation operates
correctly in a multiple character set environment must be performed by
the I/O interface.
The system I/O interface, abstracted in
Common LISP as streams, is responsible
for ensuring that text input from outside LISP is properly mapped
into character sets internally, and that the inverse mapping
\footnote{Such an inverse may not exist.
An implementation might legally fold multiple
external character sets into a single internal set on input
(e.g. EBCDIC and ASCII).
}
is performed on output.  It is beyond the scope of a language
definition to specify the details of this operation, but options
are specified which allow runtime indication from the user as to
what character sets a stream uses, and how the mappings
should be done.  It is expected that implementations will provide
reasonable defaults and invocation options to accommodate desired use
at an installation.

In addition to supporting conversion at the system interface, the
language must allow user programs to determine how much space data
objects will require when output in whichever external representations
are available.

Two keyword arguments are proposed as additions to {\clkwd open}:
\begin{itemize}
\item {\clkwd :character-set}
whose value would be:
\begin{itemize}
\item A name or list of names of
defined character sets in the form of keywords.
The default is the base character set when
{\clkwd :external-code-format} is also defaulted.  If a non-default
value is specified for {\clkwd :external-code-format}, there may be a
different default for {\clkwd :character-set}.
\end{itemize}
\item {\clkwd :external-code-format}
whose value would be:
\begin{itemize}
\item
A keyword indicating an implementation recognized scheme for
representing 1 or more character sets with non-homogeneous codes.
The default is the natural system character representation,
the base character representation.
As many {\clkwd :character-set} names must be provided as the
implementation requires for that external coding convention.
\footnote{
For example, the SO/SI SBCS/DBCS convention used by IBM on 370
machines could be selected by a keyword like
{\clkwd :shift-delimited}.
The compact run-encoding convention defined by XEROX could be
selected by {\clkwd :run-encoded}.
The SBCS/DBCS convention based on
ASCII which uses leading bit patterns to distinguish two-byte codes
from one-byte codes could be selected by a keyword like
{\clkwd :high-byte-delimited}.

For example, if {\clkwd :shift-delimited} were the
{\clkwd :external-code-format} argument, two character set specifiers
would have to be provided.
}
\end{itemize}
\end{itemize}

These arguments are provided for input, output, and I/O
(bidirectional) streams.  All characters read from the streams will be
members of the character sets specified by the {\clkwd :character-set}
argument.  It is an error to try to write a character other than a
member of
the specified sets to a stream.  (This includes the
\#$\backslash${\clkwd $N$ ewline} character.
Implementations should provide for appropriate line division behavior
through the function {\clkwd terpri}.)

The new function {\clkwd external-width} takes a character object
or string as its required argument.  It also takes an optional
{\em output-stream}.
It returns the number of host system character
representation quantum units
\footnote{
Same as the storage width of a base character, usually a byte.
}
required to externally store that object, using the indicated
representation convention.  If the item cannot be represented in
that convention, the function returns {\clkwd nil}.
This function is necessary
to determine if internal strings can be written to fixed length
fields in databases or terminal screen templates.  Note that this
function addresses the problem of storage width, and does not
address the problem of display width, which may involve calculating
screen width of strings printed in proportional fonts.

An implementation supporting multiple character sets
must allow for the external and
internal representation of characters to be separately (and perhaps
multiply) specified to {\clkwd open},
since there can be circumstances under
which more than one external representation for an internal character
set is in use, or more than one character set is mixed together in an
external representation convention.

%----------------------------------------------------------------------
%----------------------------------------------------------------------

∂27-Jun-88  0848	CL-Characters-mailer 	part2   
Received: from IBM.COM by SAIL.Stanford.EDU with TCP; 27 Jun 88  08:45:32 PDT
Date: Mon, 27 Jun 88 08:39:36 PDT
From: Thom Linden <baggins@ibm.com>
To: "X3J13: Character Subcommittee" <cl-characters@sail.stanford.edu>
Message-ID: <880627.083936.baggins@IBM.com>
Subject: part2

  Part 2 of the proposal is appended to this note.

Regards,
  Thom
-----------------------------------------------------------------------
%----------------------------------------------------------------------
%----------------------------------------------------------------------
\appendix
\chapter{Editorial Modifications to CLtL}

The following sections specify the editorial changes needed in
CLtL to support the proposal.  Section/subsection numbers and titles
match those found in \cite{steele84}.  The notation
{\bf $\Rightarrow$\ddag x} denotes a reference to paragraph x within the
subsection (we count each individual example or metastatement
as 1 paragraph of text).


%----------------------------------------------------------------------
\setcounter{section}{1}
\section{Data Types}                        % 2
%----------------------------------------------------------------------


\editstart 8 replace
\+
\\ \sf
   rich character set, including ways to represent characters of various
   type styles.
\-
\\ \bf with
\+
\\ \cltxt
   rich character repertoire.
\-
\editend

\setcounter{subsection}{1}
\subsection{Characters}                     % 2.2.

\editstart 1 replace
\+
\\ \cltxt
  Characters are represented as data objects of type {\clkwd character}.
\\
  There are two subtypes of interest, called
  {\clkwd standard-char} and {\clkwd string-char}.
\-
\\ \bf with
\+
\\ \cltxt
  Characters are represented as data objects of type
  {\clkwd character}.
\-
\editend

\editstart 2 replace
\+
\\ \cltxt
  This works well enough for printing characters. Non-printing
  characters
\-
\\ \bf with
\+
\\ \cltxt
  This works well enough for graphic characters.  Non-graphic
  characters
\-
\editend

\subsubsection{Standard Characters}         % 2.2.1.

\editstart 0 replace section heading
\+
\\ \cltxt
  Standard Characters
\-
\\ \bf with
\+
\\ \cltxt
  Base Characters
\-
\editend

\editstart 1 insert before
\+
\\ \cltxt
  Most computers have some "base" character representation which
  is a function
\\
  of hardware instructions for dealing with characters, as well as
  the organization of
\\
  the file system.  This base character representation is likely
  to be the smallest
\\
  transaction unit permitted for text stream I/O operations.
\\
  The base character representation (often a byte) supports an
  implementation specific
\\
  {\em coded base character set} such as the ASCII and the EBCDIC
  coded character sets.
\\
  The {\em base character repertoire} is defined as
  the collection of characters
\\
  contained in the coded base character set.  Common LISP does
  not define the base
\\
  character encoding
  but does require all implementations to support a "standard"
\\
  {\em subrepertoire} of the base character
  repertoire.
\-
\editend

\editstart 1 insert before
\+
\\ \cltxt
  The {\clkwd base-character} type is defined as a subtype of
  {\clkwd character}.  A {\clkwd base-character} object can
\\
  contain any member of the base character repertoire.  Objects of
  type
\\
  {\\clkwd (and character (not base-character))} are referred to
  as {\em extended characters}.
\-
\editend

\editstart 1 replace
\+
\\ \cltxt
  Common LISP defines a "standard character set" (subtype
  {\clkwd standard-char}) for two
\\
  purposes. Common LISP programs that are written in the
  standard character set
\\
  can be read by any Common LISP implementation; and Common LISP
  programs
\\
  that use only standard characters as data objects are most likely
  to be portable.  The
\\
  Common LISP character set consists of a space character
  \#$\backslash${\clkwd Space}, a newline
\\
  \#$\backslash${\clkwd Newline}, and the following ninety-four
  non-blank printing characters or their equivalents:
\-
\\ \bf with
\+
\\ \cltxt
  As a subset of the base character repertoire,
  Common LISP defines a standard character
\\
  subrepertoire for two purposes.
\\
  Common LISP programs that are written in the
  standard character subrepertoire
\\
  can be read by any Common LISP implementation; and Common LISP
  programs
\\
  that use only standard characters as data objects are most likely
  to be portable.
\\
  The standard characters are not defined by their glyphs, but by their
  roles within
\\
  the language.  There are two aspects to the roles of the
  standard characters:
\\
  one is their role in reader and format control
  string syntax; the second is their role as
\\
  components of the names of all Common LISP
  functions, macros, constants, and global variables.  As
\\
  long as an implementation chooses 96 glyphs
  and treats those 96 in a manner consistent with
\\
  the language's specification for the standard characters
  (for example,
  the naming of functions),
\\
  it doesn't matter what glyphs the I/O
  hardware uses to represent those characters: they are
\\
  the standard characters.  Any program or
  data text written wholly in those characters
\\
  is portable through simple code conversion.
 The Common LISP standard character subrepertoire
\\
  consists of a space character \#$\backslash${\clkwd Space}, a newline
  \#$\backslash${\clkwd Newline}, and the
\\
  the following nienty-four graphic characters or their equivalents:
\-
\editend

\editstart 1 insert the following table:
  {\bf Common LISP Standard Character Subrepertoire}
\footnote{\#$\backslash${\clkwd Space}
and \#$\backslash${\clkwd Newline} are omitted.
Graphic identifiers and descriptions are from ISO 6937/2.}
\editend

{\small \begin{tabular}{||l|c|l||l|c|l||}    \hline
  ID     &    Glyph    &  Name or description
& ID     &    Glyph    &  Name or description
\\ \hline
  LA01  &  a  &  small a
& ND01  &  1  &  digit 1
\\ \hline
  LA02  &  A  &  capital A
& ND02  &  2  &  digit 2
\\ \hline
  LB01  &  b  &  small b
& ND03  &  3  &  digit 3
\\ \hline
  LB02  &  B  &  capital B
& ND04  &  4  &  digit 4
\\ \hline
  LC01  &  c  &  small c
& ND05  &  5  &  digit 5
\\ \hline
  LC02  &  C  &  capital C
& ND06  &  6  &  digit 6
\\ \hline
  LD01  &  d  &  small d
& ND07  &  7  &  digit 7
\\ \hline
  LD02  &  d  &  capital D
& ND08  &  8  &  digit 8
\\ \hline
  LE01  &  e  &  small e
& ND09  &  9  &  digit 9
\\ \hline
  LE02  &  E  &  capital E
& ND00  &  0  &  digit 0
\\ \hline
  LF01  &  f  &  small f
& SC03  &  \$    &  dollar sign
\\ \hline
  LF02  &  F  &  capital F
& SP02  &  !     &  exclamation mark
\\ \hline
  LG01  &  g  &  small g
& SP04  &  "     &  quotation mark
\\ \hline
  LG02  &  G  &  capital G
& SP05  &  \apostrophe     &  apostrophe
\\ \hline
  LH01  &  h  &  small h
& SP06  &  (     &  left parenthesis
\\ \hline
  LH02  &  H  &  capital H
& SP07  &  )     &  right parenthesis
\\ \hline
  LI01  &  i  &  small i
& SP08  &  ,     &  comma
\\ \hline
  LI02  &  I  &  capital I
& SP09  &  \_    &  low line
\\ \hline
  LJ01  &  k  &  small j
& SP10  &  -     &  hyphen or minus sign
\\ \hline
  LJ02  &  K  &  capital J
& SP11  &  .     &  full stop, period
\\ \hline
  LK01  &  k  &  small k
& SP12  &  /     &  solidus
\\ \hline
  LK02  &  K  &  capital K
& SP13  &  :     &  colon
\\ \hline
  LL01  &  l  &  small l
& SP14  &  ;     &  semicolon
\\ \hline
  LL02  &  L  &  capital L
& SP15  &  ?     &  question mark
\\ \hline
  LM01  &  m  &  small m
& SA01  &  +     &  plus sign
\\ \hline
  LM02  &  M  &  capital M
& SA03  &  $<$   &  less-than sign
\\ \hline
  LN01  &  n  &  small n
& SA04  &  =   &  equals sign
\\ \hline
  LN02  &  N  &  capital N
& SA05  &  $>$   &  greater-than sign
\\ \hline
  LO01  &  o  &  small o
& SM01  &  \#    &  number sign
\\ \hline
  LO02  &  O  &  capital O
& SM02  &  \%    &  percent sign
\\ \hline
  LP01  &  p  &  small p
& SM03  &  \&    &  ampersand
\\ \hline
  LP02  &  P  &  capital P
& SM04  &  *     &  asterisk
\\ \hline
  LQ01  &  q  &  small q
& SM05  &  @     &  commercial at
\\ \hline
  LQ02  &  Q  &  capital Q
& SM06  &  [     &  left square bracket
\\ \hline
  LR01  &  r  &  small r
& SM07  &  $\backslash$   &  reverse solidus
\\ \hline
  LR02  &  R  &  capital R
& SM08  &  ]     &  right square bracket
\\ \hline
  LS01  &  s  &  small s
& SM11  &  \}    &  left curly bracket
\\ \hline
  LS02  &  S  &  capital S
& SM13  &  $|$     &  vertical bar
\\ \hline
  LT01  &  t  &  small t
& SM14  &  \}    &  right curly bracket
\\ \hline
  LT02  &  T  &  capital T
& SD13  &  \bq   &  grave accent
\\ \hline
  LU01  &  u  &  small u
& SD15  &  $\hat{ }$  &  circumflex accent
\\ \hline
  LU02  &  U  &  capital U
& SD19  &  $\tilde{ }$ &  tilde
\\ \hline
  LV01  &  v  &  small v
& & &
\\ \hline
  LV22  &  V  &  capital V
& & &
\\ \hline
  LW01  &  w  &  small w
& & &
\\ \hline
  LW02  &  W  &  capital W
& & &
\\ \hline
  LX01  &  x  &  small x
& & &
\\ \hline
  LX22  &  X  &  capital X
& & &
\\ \hline
  LY01  &  y  &  small y
& & &
\\ \hline
  LY02  &  Y  &  capital Y
& & &
\\ \hline
  LZ01  &  z  &  small z
& & &
\\ \hline
  LZ02  &  Z  &  capital Z
& & &
\\
\hline
\end{tabular} }

\editstart 2 delete
\editend

\editstart 3 delete
\editend

\editstart 4 delete
\editend

\editstart 5 delete
\editend

\editstart 6 replace
\+
\\ \cltxt
  Of the ninety-four non-blank printing characters
\-
\\ \bf with
\+
\\ \cltxt
  Of the ninety-four graphic characters
\-
\editend

\editstart 10 delete
\editend

\editstart 11 delete
\editend

\subsubsection{Line Divisions}              % 2.2.2.
\subsubsection{Non-standard Characters}     % 2.2.3.

\editstart delete entire section
\editend

\subsubsection{Character Attributes}        % 2.2.4.

\editstart 1 delete
\editend

\editstart 1 new
\+
\\ \cltxt
  Every object of type {\clkwd character} has three attributes:
\\
  {\sf code, character-set}, and {\sf character-set-index}.
\\
  Character identity is uniquely distinguished by either the code
\\
  attribute or the combined character-set and character-set-index
  attributes.
\\
\\
 {\bf Note: Bob Kerns is reworking the following paragraph}
\\
\\
  If an implementation has additional attributes of characters,
\\
  dealing with how the character is displayed or its typography,
\\
  these attributes are not part of the code, character-set or
\\
  character-set-index attributes.  For example, bold-face, color
\\
  or size are not considered part of the identity of a character
\\
  and are not included.  Case, however, is part of the character
  identity.
\\
  In symbol construction, implementation defined attributes such as
\\
  color are removed.
\\
  It is implementation dependent whether characters within
\\
  double quotes have any implementation defined attributes removed.
\\
  If two characters have identical implementation defined attributes,
  then their ordering by
\\
  {\clkwd char}$<$ is consistent with the numerical ordering by the
  predicate $<$ on their code
\\
  attributes.
\-
\editend

\editstart 2 delete
\editend
\editstart 3 delete
\editend
\editstart 4 delete
\editend
\editstart 5 delete
\editend

\subsubsection{String Characters}           % 2.2.5.
\editstart delete this section
\editend

\subsection{Symbols}                        % 2.3.

\editstart 12 replace
\+
\\ \cltxt
  A symbol may have uppercase letters, lowercase letters, or both
  in its print name.
\-
\\ \bf with
\+
\\ \cltxt
  A symbol may have characters from any supported character repertoire
\\
  in its print name.
\\
  It may have uppercase letters, lowercase letters, or both.
\-
\editend

\setcounter{subsection}{4}
\subsection{Arrays}
\subsubsection{Vectors}

\editstart 6 replace
\+
\\ \cltxt
  All implementations provide specialized arrays for the cases when
\\
  the components are characters (or rather, a special subset of the
  characters);
\-
\\ \bf with
\+
\\ \cltxt
  All implementations provide specialized arrays for the cases when
\\
  the components are characters (or optionally, special subsets of
  the characters);
\-
\editend

\subsubsection{Strings}

\editstart 1 replace
\+
\\ \cltxt
  A string is simply a vector of characters.  More precisely, a string
\\
  is a specialized vector whose elements are of type
  {\clkwd string-char}.
\-
\\ \bf with
\+
\\ \cltxt
  A string is simply a vector of characters.  More precisely, a string
\\
  is a specialized vector whose elements are of type
  {\clkwd character} or a subtype of character.
\-
\editend

\setcounter{subsection}{14}
\subsection{Overlap, Inclusion, and Disjointness of Types} % 2.15.

\editstart 14 replace
\+
\\ \cltxt
  The type {\clkwd standard-char} is a subtype of {\clkwd string-char};
\\
  {\clkwd string-char} is a subtype of {\clkwd character}.
\-
\\ \bf with
\+
\\ \cltxt
{\bf Compatibility note:  -------------}
\\
  The type {\clkwd standard-char} is a subtype of {\clkwd character};
\\
  The type {\clkwd string-char} means {\clkwd character}.  Both
\\
  are retained for compatibility with earlier versions of Common LISP.
\\
{\bf --------------------------------------------}
\-
\editend

\editstart 15 replace
\+
\\ \cltxt
  The type {\clkwd string} is a subtype of {\clkwd vector},
\\
  for {\clkwd string} means {\clkwd (vector string-char)}.
\-
\\ \bf with
\+
\\ \cltxt
  The type {\clkwd string} is a subtype of {\clkwd vector},
\\
  {\clkwd string} consists of vectors specialized by subtypes of
  {\clkwd character}.
\-
\editend

\editstart 15 insert after
\+
\\ \cltxt
  The type {\clkwd most-general-string} is equivalent to
\\
  {\clkwd (vector character)} and is a subtype of {\clkwd string}.
\-
\editend

\editstart 15 insert new paragraph
\+
\\ \cltxt
  The type {\clkwd base-string} is equivalent to
\\
  {\clkwd \apostrophe (vector base-character)}.
\-
\editend

\editstart 20 replace
\+
\\ \cltxt
  {\clkwd (simple-array string-char (*))};
\-
\\ \bf with
\+
\\ \cltxt
  {\clkwd (simple-array character (*))};
\-
\editend

\editstart 20 insert after
\+
\\ \cltxt
  The type {\clkwd simple-base-string} is equivalent to
\\
  {\clkwd (simple-array base-character (*))} and
\\
  is the most efficient string which can hold
  the standard character repertoire.
\-
\editend





%----------------------------------------------------------------------
\setcounter{section}{3}
\section{Type Specifiers}                   % 4
%----------------------------------------------------------------------
\setcounter{subsection}{1}
\subsection{Type Specifier Lists} % 4.2.


\editstart 8 remove from table 4-1 (alphabetic list)
\+
\\ \cltxt
  {\clkwd standard-char}
\\
  {\clkwd string-char}
\-
\editend

\editstart 8 insert into table 4-1 (alphabetic list)
\+
\\ \cltxt
  {\clkwd base-character}
\\
  {\clkwd most-general-string}
\\
  {\clkwd simple-base-string}
\-
\editend

\setcounter{subsection}{2}
\subsection{Predicating Type Specifiers} % 4.3.

\editstart 2 delete
\editend

\editstart 3 delete the example
\editend

\setcounter{subsection}{5}
\subsection{Type Specifiers That Abbreviate} % 4.6.

\editstart 20 replace
\+
\\ \cltxt
  Means the same as {\clkwd (array string-char ({\em size}))}: the set of
  strings of the indicated size.
\\
\-
\\ \bf with
\+
\\ \cltxt
  Means the union of the vector types specialized by subtypes of
  character and the indicated size.
\-
\editend

\editstart 23 replace
\+
\\ \cltxt
  Means the same as {\clkwd (simple-array string-char ({\em size}))}: the
\\
  set of simple strings of the indicated size.
\\
\-
\\ \bf with
\+
\\ \cltxt
  Means the same as {\clkwd (simple-array character ({\em size}))}: the
\\
  set of simple strings of the indicated size.
\-
\editend

\editstart 23 insert after
\+
\\ \cltxt
  {\clkwd (base-string {\em size})}
\\
  Means the same as {\clkwd (array base-character ({\em size}))}: the
\\
  set of base strings of the indicated size.
\\
\-
\editend

\editstart 23 insert after
\+
\\ \cltxt
  {\clkwd (simple-base-string {\em size})}
\\
  Means the same as {\clkwd (simple-array base-character ({\em size}))}:
\\
  the set of simple base strings of the indicated size.
\\
\-
\editend


%----------------------------------------------------------------------
\setcounter{section}{5}
\section{Predicates}                        % 6
%----------------------------------------------------------------------
\editstart 2 replace
\+
\\ \cltxt
  but {\clkwd standard-char} begets {\clkwd standard-char-p}
\-
\\ \bf with
\+
\\ \cltxt
  but {\clkwd bit-vector} begets {\clkwd bit-vector-p}
\-
\editend

\setcounter{subsection}{1}
\subsection{Data Type Predicates} % 6.2.

\setcounter{subsubsection}{1}
\subsubsection{Specific Data Type Predicates} % 6.2.2.

\editstart 36 replace
\+
\\ \cltxt
  {\clkwd characterp} {\em object}
\-
\\ \bf with
\+
\\ \cltxt
  {\clkwd characterp} {\em object} \&{\clkwd optional}
  ({\em repertoire})
\-
\editend

\editstart 37 replace
\+
\\ \cltxt
  {\clkwd characterp} is true if its argument is a character, and
  otherwise is false.
\\
\-
\\ \bf with
\+
\\ \cltxt
  If {\em repertoire} is omitted, {\clkwd characterp}
  is true if its argument is a character object, and otherwise is false.
\\
\\
  If a {\em repertoire} keyword argument is specified,
  {\clkwd characterp} is true if its argument is a
\\
  character object and a member of the specified repertoire
  or subrepertoire, and otherwise is false.
\\
  For example, {\clkwd (characterp  \#$\backslash$A}
  {\clkwd :standard)}
\\
  is true since \#$\backslash$A is a member of the standard character
  subrepertoire.
\-
\editend

\editstart 38 replace
\+
\\ \cltxt
  {\clkwd (characterp x) $\equiv$ (typep x \apostrophe character)}
\\
\-
\\ \bf with
\+
\\ \cltxt
  {\clkwd (characterp x :standard) $\equiv$ (typep x \apostrophe
  (character :standard)}
\-
\editend

\editstart 72 replace
\+
\\ \cltxt
  See also {\clkwd standard-char-p, string-char-p, streamp,}
\\
\-
\\ \bf with
\+
\\ \cltxt
  See also {\clkwd standard-char-p, streamp,}
\-
\editend

\setcounter{subsubsection}{2}
\subsubsection{Equality Predicates} % 6.2.3.

\editstart 75 replace
\+
\\ \cltxt
  which ignores alphabetic case and certain other attributes
  of characters;
\\
\-
\\ \bf with
\+
\\ \cltxt
  which ignores alphabetic case
  of characters;
\-
\editend

%----------------------------------------------------------------------
\setcounter{section}{6}
\section{Control Structure}                 % 7
%----------------------------------------------------------------------

\setcounter{subsection}{1}
\subsection{Generalized Variables} % 7.2.

\editstart 19 modify table
\+
\\ \cltxt
  char               string-char
\\
  schar              string-char
\-
\\ \bf with
\+
\\ \cltxt
  char               character
\\
  schar              character
\-
\editend

\editstart 22 delete table entry
\+
\\ \cltxt
  char-bit           first                  set-char-bit
\-
\editend

%----------------------------------------------------------------------
\setcounter{section}{9}
\section{Symbols}                           % 10
%----------------------------------------------------------------------

\editstart 3 replace
\+
\\ \cltxt
  It is ordinarily not permitted to alter a symbol's print name.
\-
\\ \bf with
\+
\\ \cltxt
  It is an error to alter a symbol's print name.
\-
\editend

\setcounter{subsection}{1}
\subsection{The Print Name} % 10.2.

\editstart 5 replace
\+
\\ \cltxt
  It is an extremely bad idea
\-
\\ \bf with
\+
\\ \cltxt
  It is an error and an extremely bad idea
\-
\editend

%----------------------------------------------------------------------
\setcounter{section}{12}
\section{Characters}                        % 13
%----------------------------------------------------------------------

\setcounter{subsection}{0}
\subsection{Character Attributes} % 13.1.

\editstart 1 replace
\+
\\ \cltxt
  Every character has three attributes: code, bits, and font. The
  code attribute is
\\
  intended to distinguish among the printed glyphs and formatting
  functions for
\\
  characters.  The bits attribute allows extra flags to be associated
  with a character.
\\
  The font attribute permits a specification of the style of the glyphs
  (such as italics).
\-
\\ \bf with
\+
\\ \cltxt
  Every character has three attributes: code, character-set, and
  character-set-index.
\\
  The code attribute is intended to distinguish among glyphs and
  formatting functions for
\\
  characters.  The character-set and character-set-index attributes
  identify the character's
\\
  membership within a specific character set.  Combined, character-set
  and character-set-index
\\
  encode the same information as the code attribute.
\-
\editend

\editstart 3 append
\+
\\ \cltxt
  There may be unassigned codes between {\clkwd char-code-limit} which
\\
  are not legal arguments to {\clkwd code-char}.
\-
\editend

\editstart 4 delete
\editend

\editstart 5 delete
\editend

\editstart 6 delete
\editend

\editstart 7 delete
\editend

\editstart 8 delete
\editend

\editstart 9 delete
\editend

\setcounter{subsection}{1}
\subsection{Predicates on Characters} % 13.2.


\editstart 3 replace
\+
\\ \cltxt
  argument is a "standard character" that is, an object of type
  {\clkwd standard-char}.
\\
   Note that any character with a non-zero {/em bits} or {/em font}
   attribute is non-standard.
\-
\\ \bf with
\+
\\ \cltxt
  argument is one of the Common LISP standard character subrepertoire.
\-
\editend

\editstart 5 delete
\+
\\ \cltxt
  The semi-standard characters \#$\backslash${\clkwd Backspace},
  \#$\backslash${\clkwd Tab},
  \#$\backslash${\clkwd Rubout},
  \#$\backslash${\clkwd Linefeed},
  \#$\backslash${\clkwd Return},
  and \#$\backslash${\clkwd Page} are note graphic.
\-
\editend

\editstart 6 delete
\editend

\editstart 7 delete
\editend

\editstart 8 delete
\editend

\editstart 9 delete
\editend

\editstart 12 replace
\+
\\ \cltxt
  If a character is alphabetic, then it is perforce graphic.  Therefore
  any character
\\
  with a non-zero bits attribute cannot be alphabetic.  Whether a
  character is alphabetic
\\
  may depend on its font number.
\-
\\ \bf with
\+
\\ \cltxt
  If a character is alphabetic, then it is perforce graphic.
\-
\editend

\editstart 21 replace
\+
\\ \cltxt
  If a character is either uppercase or lowercase, it is necessarily
  alphabetic (and
\\
  therefore is graphic, and therefore has a zero bits attribute).
\\
  However, it is permissible in theory for an alphabetic character
  to be neither uppercase
\\
  nor lowercase (in a non-Roman font, for example).
\-
\\ \bf with
\+
\\ \cltxt
  If a character is either uppercase or lowercase, it is necessarily
  alphabetic (and
\\
  therefore is graphic).
\-
\editend

\editstart 24 replace
\+
\\ \cltxt
  The argument {\em char} must be a character object, and {\em radix}
  must be a non-negative
\\
  integer. If {\em char} is not a digit of the radix specified
\-
\\ \bf with
\+
\\ \cltxt
  The argument {\em char} must be in the standard character
  subrepertoire and
\\
  {\em radix} must be a non-negative integer.
\\
  If {\em char} is not a standard character or is not a digit of the
  radix specified
\-
\editend

\editstart 46 delete
\editend

\editstart 47 replace
\+
\\ \cltxt
  If two characters differ in any attribute (code, bits, or font), then
  they
\-
\\ \bf with
\+
\\ \cltxt
  If two characters differ in any attribute, then
  they
\-
\editend

\editstart 89 replace
\+
\\ \cltxt
  The predicate {\clkwd char-equal} is like {\clkwd char=}, and
  similarly for the others, except
\\
  according to a different ordering such that differences of bits
\\
  attributes and case are ignored, and font information is taken into
\\
  account in an implementation dependent manner.
\-
\\ \bf with
\+
\\ \cltxt
  The predicate {\clkwd char-equal} is like {\clkwd char=}, and
  similarly for the others, except
\\
  according to a different ordering such that differences of case and
\\
  implementation defined attributes are ignored.
\-
\editend

\editstart 93 delete
\editend

\setcounter{subsection}{2}
\subsection{Character Construction and Selection} % 13.3.

\editstart 3 replace
\+
\\ \cltxt
  this will be a non-negative integer less than the (normal) value
\-
\\ \bf with
\+
\\ \cltxt
  this will be a non-negative integer less than the value
\-
\editend

\editstart 4 delete
\editend

\editstart 5 delete
\editend

\editstart 6 delete
\editend

\editstart 7 delete
\editend

\editstart 8 replace
\+
\\ \cltxt
  {\clkwd code-char {\em code} \&optional {\em (bits 0) (font 0)}
  [{\em Function}]}
\-
\\ \bf with
\+
\\ \cltxt
  {\clkwd code-char {\em code}
  [{\em Function}]}
\-
\editend

\editstart 9 replace
\+
\\ \cltxt
  All three arguments must be non-negative integers.  If it is possible
  in the
\\
  implementation to construct a character object whose code attribute
  is {\em code}, whose
\\
  bits attribute is {\em bits}, and whose font attribute is {\em font},
  then such an object is returned;
\-
\\ \bf with
\+
\\ \cltxt
  The argument must be a non-negative integer.  If it is possible
  in the
\\
  implementation to construct a character object whose code attribute
  is {\em code},
\\
  then such an object is returned;
\-
\editend

\editstart 10 replace
\+
\\ \cltxt
  For any integers, {\em c, b,} and {\em f}, if {\clkwd (code-char
  {\em c b f})} is
\-
\\ \bf with
\+
\\ \cltxt
  For any integer, {\em c}, if {\clkwd (code-char
  {\em c})} is
\-
\editend

\editstart 12 delete
\editend

\editstart 13 delete
\editend

\editstart 14 replace
\+
\\ \cltxt
  If the font and bits attributes of a character object {\clkwd c}
  are zero, then it is the case that
\-
\\ \bf with
\+
\\ \cltxt
  If the implementation defined
  attributes of a character object {\clkwd c}
  do not exist, then
\-
\editend

\editstart 17 delete
\editend

\editstart 18 delete
\editend

\editstart 19 delete
\editend

\setcounter{subsection}{3}
\subsection{Character Conversions} % 13.4.

\editstart 8 replace
\+
\\ \cltxt
  {\clkwd char-upcase} returns a character object with the same
  font and bits attributes
\-
\\ \bf with
\+
\\ \cltxt
  {\clkwd char-upcase} returns a character object with the same
  implementation defined attributes
\-
\editend

\editstart 10 replace
\+
\\ \cltxt
  Similarly, {\clkwd char-downcase} returns a character object with the
\\
  same font and bits attributes
\-
\\ \bf with
\+
\\ \cltxt
  Similarly, {\clkwd char-downcase} returns a character object with the
\\
  same implementation defined attributes
\-
\editend

\editstart 12 delete
\editend

\editstart 13 replace
\+
\\ \cltxt
  {\clkwd digit-char {\em weight} \&optional ({\em radix} 10)
  ({\em font} 0)      [{\em Function}]}
\-
\\ \bf with
\+
\\ \cltxt
  {\clkwd digit-char {\em weight} \&optional ({\em radix} 10)
       [{\em Function}]}
\-
\editend

\editstart 14 replace
\+
\\ \cltxt
  All arguments must be integers.  {\clkwd digit-char} determines
  whether or not it is possible
\\
  to construct a character object whose font attribute is {\em font},
  and whose {\em code}
\-
\\ \bf with
\+
\\ \cltxt
  All arguments must be integers.  {\clkwd digit-char} determines
  whether or not it is possible
\\
  to construct a character object whose {\em code}
\-
\editend

\editstart 15 replace
\+
\\ \cltxt
  {\clkwd digit-char} cannot return {\clkwd nil} if {\em font}
  is zero, {\em radix}
\-
\\ \bf with
\+
\\ \cltxt
  {\clkwd digit-char} cannot return {\clkwd nil}.
  {\em radix}
\-
\editend

\editstart 22 delete
\editend

\editstart 32 replace
\+
\\ \cltxt
  All characters that have zero font and bits attributes and that are
  non-graphic
\-
\\ \bf with
\+
\\ \cltxt
  All characters that are
  non-graphic
\-
\editend

\editstart 35 delete
\editend

\setcounter{subsection}{4}
\subsection{Character Control-Bit Functions} % 13.5.

\editstart delete entire section
\editend

%----------------------------------------------------------------------
\setcounter{section}{13}
\section{Sequences}                         % 14
%----------------------------------------------------------------------
\setcounter{subsection}{0}
\subsection{Simple Sequence Functions}         % 14.1

\editstart 24 append
\+
\\ \cltxt
  If type {\clkwd string} is specified, a string of type
  {\clkwd extended-string} is returned.
\-
\editend

\setcounter{subsection}{1}
\subsection{Concatenating, Mapping, and Reducing Sequences}  % 14.2.

\editstart 3 append
\+
\\ \cltxt
  If {\em result-type} {\clkwd string} is specified, any string
\\
  subtype which can hold the elements of the sequence can be returned.
\-
\editend

\editstart 6 append
\+
\\ \cltxt
  If {\em result-type} {\clkwd string} is specified, any string
\\
  subtype which can hold the elements of the sequence can be returned.
\-
\editend

\setcounter{subsection}{2}
\subsection{Modifying Sequences}  % 14.3.

\editstart 29 append
\+
\\ \cltxt
  If {\em newitem} is of type {\clkwd string}, any string subtype
\\
  which can hold the elements of the result sequence can be returned.
\-
\editend

\editstart 36 append
\+
\\ \cltxt
  If {\em newitem} is of type {\clkwd string}, any string subtype
\\
  which can hold the elements of the result sequence can be returned.
\-
\editend

\setcounter{subsection}{4}
\subsection{Sorting and Merging}  % 14.5.

\editstart 20 append
\+
\\ \cltxt
  If {\em result-type} {\clkwd string} is specified, any string subtype
\\
  which can hold the elements of the result sequence can be returned.
\-
\editend

%----------------------------------------------------------------------
\setcounter{section}{17}
\section{Strings}                           % 18
%----------------------------------------------------------------------

\editstart 1 replace
\+
\\ \cltxt
  Specifically, the type {\clkwd string} is identical to the type
  {\clkwd (vector string-char),}
\\
  which in turn is the same as {\clkwd (array string-char (*))}.
\-
\\ \bf with
\+
\\ \cltxt
  Specifically, the type {\clkwd string} is a subtype of
  {\clkwd vector}
\\
  and consists of vectors specialized by subtypes of {\clkwd character}.
\-
\editend

\setcounter{subsection}{0}
\subsection{String Access}  % 18.1.

\editstart 3 replace
\+
\\ \cltxt
  {\clkwd schar} {\em simple-string index}             [{\em Function}]
\-
\\ \bf with
\+
\\ \cltxt
  {\clkwd schar} {\em simple-base-string index}        [{\em Function}]
\-
\editend

\editstart 4 replace
\+
\\ \cltxt
  character object.  (This character will necessarily satisfy the
  predicate {\clkwd string-char-p}).
\-
\\ \bf with
\+
\\ \cltxt
  character object.
\-
\editend

\editstart 10 replace
\+
\\ \cltxt
  it must be a simple string.
\-
\\ \bf with
\+
\\ \cltxt
  it must be a simple base string.
\-
\editend

\setcounter{subsection}{2}
\subsection{String Construction and Manipulation}  % 18.3.

\editstart 2 replace
\+
\\ \cltxt
  {\clkwd make-string {\em size} \&key :initial-element  [{\em Function}]}
\-
\\ \bf with
\+
\\ \cltxt
  {\clkwd make-string {\em size} \&key :initial-element  :element-type
  [{\em Function}]}
\-
\editend

\editstart 3 replace
\+
\\ \cltxt
  This returns a string (in fact a simple string) of length {/em size},
\-
\\ \bf with
\+
\\ \cltxt
  This returns a string of length {/em size},
\-
\editend

\editstart 5 replace
\+
\\ \cltxt
  A string is really just a one-dimensional array of "string
  characters" (that is,
\\
  those characters that are members of type {\clkwd string-char}).
\\
  More complex character arrays may be constructed using the function
  {\clkwd make-array}.
\-
\\ \bf with
\+
\\ \cltxt
  More complex character arrays may be constructed using the function
  {\clkwd make-array}.
\-
\editend

\editstart 29 replace
\+
\\ \cltxt
  If {\em x} is a string character (a character of type
  {\clkwd string-char}), then
\-
\\ \bf with
\+
\\ \cltxt
  If {\em x} is a character, then
\-
\editend

%----------------------------------------------------------------------
\setcounter{section}{21}
\section{Input/Output}                      % 22

\setcounter{subsection}{0}
\subsection{Printed Representation of LISP Objects}  % 22.1.

\setcounter{subsubsection}{0}
\subsubsection{What the Read Function Accepts}  % 22.1.1.

\editstart delete from Table 22-1: Standard Character Syntax Types
\+
\\ \cltxt
  <tab> {em whitespace}
\\
  <page> {em whitespace}
\\
  <backspace> {em constituent}
\\
  <return> {em whitespace}
\\
  <rubout> {em constituent}
\\
  <linefeed> {em whitespace}
\-
\editend

\setcounter{subsubsection}{1}
\subsubsection{Parsing of Numbers and Symbols}  % 22.1.2.

\editstart delete from Table 22-3: Standard Constituent Character
Attributes
\+
\\ \cltxt
  <backspace> {em illegal}
\\
  <tab> {em illegal}
\\
  <linefeed> {em illegal}
\\
  <page> {em illegal}
\\
  <return> {em illegal}
\\
  <rubout> {em illegal}
\-
\editend

\setcounter{subsubsection}{3}
\subsubsection{Standard Dispatching Macro Character Syntax}  % 22.1.4.

\editstart delete from Table 22-4: Standard \# Macro Character Syntax
\+
\\ \cltxt
  \#<backspace> {em signals error}
\\
  \#<tab> {em signals error}
\\
  \#<linefeed> {em signals error}
\\
  \#<page> {em signals error}
\\
  \#<return> {em signals error}
\\
  \#<rubout> {em undefined}
\-
\editend

\editstart ??? add
\+
\\ \cltxt
  Table 22-4 and text.  extended to include a construct for
  extended character objects.
\-
\editend

\editstart 11 through 18 inclusive delete
\editend

\editstart 20 through 26 inclusive delete
\editend

\editstart 108 replace
\+
\\ \cltxt
  {\clkwd \#<space>, \#<tab>, \#<newline>, \#<page>, \#<return>}
\-
\\ \bf with
\+
\\ \cltxt
  {\clkwd \#<space>, \#<newline>}
\-
\editend

\setcounter{subsubsection}{4}
\subsubsection{The Readtable}  % 22.1.5.

\editstart 3 replace
\+
\\ \cltxt
  Even if an implementation supports characters with non-zero
  {\em bits} and {\em font}
\\
  attributes, it need not (but may) allow for such characters to
  have syntax descriptions
\\
  in the readtable.  However, every character of type
  {\clkwd string-char} must be
\\
  represented in the readtable.
\-
\\ \bf with
\+
\\ \cltxt
  Even if an implementation supports extended characters, it
  need not
\\
  (but may) allow for such characters to
  have syntax descriptions
\\
  in the readtable.  However, every character of type
  {\clkwd base-character} must be
\\
  represented in the readtable.
\-
\editend

\setcounter{subsubsection}{5}
\subsubsection{What the Print Function Produces}  % 22.1.6.

\editstart 13 replace
\+
\\ \cltxt
  is used.  For example, the printed representation of the character
  \#$\backslash$A with control
\\
  and meta bits on would be \#$\backslash${\clkwd CONTROL-META-A},
  and that of
\\
  \#$\backslash$a with control and meta bits on would be
  \#$\backslash${\clkwd CONTROL-META-$\backslash$a}.
\-
\\ \bf with
\+
\\ \cltxt
  is used.
\-
\editend

\setcounter{subsection}{2}
\subsection{Output Functions}  % 22.3.

\setcounter{subsubsection}{0}
\subsubsection{Output to Character Streams}  % 22.3.1.

\editstart 27 insert after
\+
\\ \cltxt
  {\clkwd external-width} {\em object} \&{\clkwd optional}
  {\em output-stream}   [{\em Function}]
\\
\\
  {\clkwd external-width} returns the number of host system base
\\
  character units required for the object on the output-stream. If
  not applicable to the output
\\
  stream (For example, a display device
  with proportional fonts), the function should return {\clkwd nil}.
\-
\editend

\setcounter{subsubsection}{2}
\subsubsection{Formatted Output to Character Streams}  % 22.3.3.

\editstart 23 delete entire example
\+
\\ \cltxt
  {\clkwd (format nil "Type} $\tilde{ }$
  {\clkwd :C to $\tilde{ }$ :A."} . . .
\-
\editend

\editstart 66 replace
\+
\\ \cltxt
  $\tilde{ }${\clkwd :C} spells out the names of the control bits and
  represents non-printing
\\
  characters by their names: {\clkwd Control-Meta-F, Control-Return,
  Space}.  This is a "pretty" format for printing characters.
\-
\\ \bf with
\+
\\ \cltxt
  $\tilde{ }${\clkwd :C}
  represents non-printing
\\
  characters by their names: {\clkwd Newline,
  Space}.  This is a "pretty" format for printing characters.
\-
\editend
%----------------------------------------------------------------------

%----------------------------------------------------------------------
\setcounter{section}{22}
\section{File System Interface}             % 23

\setcounter{subsection}{1}
\subsection{Opening and Closing Files}  % 23.2.

\editstart 2 replace
\+
\\ \cltxt
  {\clkwd open {\em filename} \&key :direction :element-type}
  [{\em Function}]
\\
  {\clkwd :if-exists :if-does-not-exist}
\-
\\ \bf with
\+
\\ \cltxt
  {\clkwd open {\em filename} \&key :direction :element-type}
  [{\em Function}]
\\
  {\clkwd :external-code-format
  :character-set}
\\
  {\clkwd :if-exists :if-does-not-exist}
\-
\editend

\editstart 11 replace
\+
\\ \cltxt
  {\clkwd string-char}
\\
  The unit of transaction is a string-character.  The functions
  {\clkwd read-char}
\\
  and/or {\clkwd write-char} may be used on the stream.  This is
  the default.
\-
\\ \bf with
\+
\\ \cltxt
  {\clkwd base-character}
\\
  The unit of transaction is a base-character.  The functions
  {\clkwd read-char}
\\
  and/or {\clkwd write-char} may be used on the stream.  This is
  the default.
\-
\editend

\editstart 16 replace
\+
\\ \cltxt
  {\clkwd character}
\\
  The unit of transaction is any character, not just a string-character.
  The functions
\-
\\ \bf with
\+
\\ \cltxt
  {\clkwd character}
\\
  The unit of transaction is any character.
  The functions
\-
\editend

\editstart 19 insert after
\+
\\ \cltxt
  {\clkwd :external-code-format}
\\
  The
\-
\editend

\editstart 19 insert after
\+
\\ \cltxt
  {\clkwd :character-set}
\\
  The
\-
\editend
%----------------------------------------------------------------------

%----------------------------------------------------------------------
\begin{thebibliography}{wwwwwwww 99}


\bibitem[Ida87]{ida87} M. Ida, et al.,
{\em
JEIDA Common LISP Committee Proposal on Embedding Multi-Byte Characters
},
ANSI X3J13 document 87-022, (1987).

\bibitem[Linden87]{linden87} T. Linden,
{\em
Common LISP - Proposed Extensions for International Character Set
Handling
},
Version 01.11.87, IBM Corporation (1987).

\bibitem[Kerns87]{kerns87} R. Kerns,
{\em
Extended Characters in Common LISP
},
X3J13 Character Subcommittee document, Symbolics Inc (1987).

\bibitem[Steele84]{steele84} G. Steele Jr.,
{\em
Common LISP: the Language
},
Digital Press (1984).

\end{thebibliography}

\end{document}             % End of document.

∂30-Jun-88  0823	CL-Characters-mailer 	latex document    
Received: from IBM.COM by SAIL.Stanford.EDU with TCP; 30 Jun 88  08:23:38 PDT
Date: Thu, 30 Jun 88 08:08:34 PDT
From: Thom Linden <baggins@ibm.com>
To: "X3J13: Character Subcommittee" <cl-characters@sail.stanford.edu>
Message-ID: <880630.080834.baggins@IBM.com>
Subject: latex document

Paul Beiser is having trouble getting the appendix to print.  Is
anyone else having problems printing?

Regards,
  Thom

∂14-Jul-88  1521	CL-Characters-mailer 	subcommittee document  
Received: from IBM.COM by SAIL.Stanford.EDU with TCP; 14 Jul 88  15:21:31 PDT
Date: Thu, 14 Jul 88 15:11:03 PDT
From: Thom Linden <baggins@ibm.com>
To: "X3J13: Character Subcommittee" <cl-characters@sail.stanford.edu>
Message-ID: <880714.151103.baggins@IBM.com>
Subject: subcommittee document

I haven't heard any comments on the preliminary proposal.  Please
insure it is read and your comments/corrections are made in the
next two weeks.  Remember, the first week in August is our
schedule for releasing it from subcommittee.

Regards,
  Thom

∂20-Jul-88  1346	CL-Characters-mailer 	Forwarding comments from Paul.   
Received: from IBM.COM by SAIL.Stanford.EDU with TCP; 20 Jul 88  13:45:50 PDT
Date: Wed, 20 Jul 88 13:24:31 PDT
From: Thom Linden <baggins@ibm.com>
To: "X3J13: Character Subcommittee" <cl-characters@sail.stanford.edu>
Message-ID: <880720.132431.baggins@IBM.com>
Subject: Forwarding comments from Paul.

-------------------------------------------------------

I have several comments on the draft. First of all, typos. There should be
a "." after "[Steele84]" on page 1. On the same page, replace "Providing"
(under the first bulleted item) with "To provide" (to make it consistent
with the other bulleted items).

It really looks pretty good. Currently I have 4 people reviewing it
within HP (including someone in our Japan group), and their comments should
be back to me around Aug 1. I also forwarded a copy to Lucid to get
their reactions to it - after all, they will be implementing it eventually
for us!

Other comments.

*) Standard # Macro Character Syntax. I do not believe that there can be
   a standard convention here UNLESS we have standard character set
   identifiers. The proposal specifically avoids this (see footnote 4, pg
   6). I do not see how the reader could read such a character unless
   these character set identifiers were known to it. So, it looks as if
   the only way to embed such a character would be with read-time evaluation
   of functions CODE-CHAR or MAKE-CHAR, which leads me to ask: do these
   functions need a :character-set option like OPEN does?

*) I think that sticking with simple-base-string only and eliminating
   simple-string is good. However, I guess I could see the need for a
   simple-extended-string type if we could guarantee that all extended
   strings have the same "width" (that is, strings have either base characters,
   in which the widths are known, or they have characters, in which the
   width is known), but I do not think that this is part
   of the proposal (well, at least I could not find it!). Should it be?
   I guess I'm not sure.

*) I guess we need an EXTERNAL-WIDTH function if we have a multiplicity of
   character widths. This leads to another question: can't we have just 2
   widths and make them constants?? Like BASE-CHARACTER-WIDTH and
   CHARACTER-WIDTH?

*) On page 20, you have "For example (characterp #\A :standard)".  What are
   the other allowable keyword arguments here? Is :base one of them? Are
   all character set identifiers allowed here?

*) On page 21 you have "Every character has three attributes: code,
   character-set, and character-set-index". Are there functions to return
   character-set and character-set-index given a character? Are they
   setf'able?

*) MAKE-STRING has a new argument, :element-type. What are the allowable
   values here?

I guess more than anything, the draft lacks good examples to point out
questions people may have. If I'm confused, I think that is probably
the main reason. I think some of your footnotes have some good examples -
maybe we need to move those up into the text and make them fully part of
the proposal.

Another shortcoming is lack of an implementation. I think one of the strengths
of the CLOS and Error Signalling Proposals were that they had quite a
bit of experience with implementations and were able to realy understand
things because of that. I know that Symbolics has an implementation, but
unfortunately I am not familiar with it, nor do I have access.

I guess we'll have lots of work to do before and at the Oct meeting!

I will get the other comments and send them to you as soon as I get them.

Regards,

Paul


∂05-Aug-88  1154	CL-Characters-mailer 	comments on draft 
Received: from IBM.COM by SAIL.Stanford.EDU with TCP; 5 Aug 88  11:54:07 PDT
Date: Fri, 05 Aug 88 10:24:20 PDT
From: Thom Linden <baggins@ibm.com>
To: Paul Beiser <paul%hpfclp.sde.hp.com@relay.cs.net>
cc: "X3J13: Character Subcommittee" <cl-characters@sail.stanford.edu>
Message-ID: <880805.102420.baggins@IBM.com>
Subject: comments on draft

  A few comments on your comments (in the same *'ed order you listed):


1)  Standard # macro character syntax.....

    This is a good point.  In fact, I think bob kerns mentioned that
    this is the place to use the ISO glyph identifiers listed on
    p. 15.
     ie. #\LA01   is equivalent to #\a .
    This would be portable across systems.

    Unfortunately this leads to a large set of such names when
    considering the ,say, Kanji glyph set and a corresponding
    performance burden.  Thus the rational for
    a form which is a encoding of the
    glyphid.  The form would allow an implementation
    which can handle multiple glyph sets to provide this function
    even in an environment (eg. file system) which does not.

    A suggestion from LUCID was:   #\name:xxxx   where name
    is the character [sub]repetoire name and xxxx is the index of
    the character in hexidecimal.  strings thus are printed
    as #( #\name:xxxx #\name:yyyy  ...  )

      Thus, for example,  #\JIS:4F35 could be read into my lisp
    implementation from a file it knows contains only standard-chars
    and treated, say, as the Yen glyph.

    At this point, I don't like this either.  It seems to be
    supporting a (hopefully) interm problem where the lisp
    implementation has capabilities greater than it's environment.

    Thus, I now think leaving things the way they are is correct.
    ie. The only standardized 'named' glyphs are #\space and #\newline.
    All others represent themselves.  #\a represents LA01..etc.  Of
    course, the file can only contain characters allowed by
    the files :character-set and :external-code-format values.


2)  I think that sticking with simple-base-string.....
3)  I guess we need an EXTERNAL-WIDTH function ...

    The rational here is an implementation may have more than
    two widths just as it may have more than one variety of
    extended-character.  For example, a Korean glyph set might
    be kept in a 3 byte cell, a Kanji set in a 2 byte cell and
    the base in 1 byte.


4)  On page 20, you have "For example (characterp #\A :standard)"....

    Any character [sub]repertiore name is allowed here.  :standard is
    the only one ANSI CL defines.  Others could be unique
    to an implementation but are more likely names like :ISO8859-1988
    or :JISxxxxx.  (see page 6 for some discussion of this).


5)  On page 21 you have "Every character......

    The code is currently not decomposable.  There is the test above
    for the character-set.   I'm open to some function suggestions
    to extract character-set or character-set-index.
       eg. (char-character-set  char) and
           (char-character-set-index char)
    They could be set'f able.  I seem to recall some discussion of
    this previously. Unfortunately, I don't recall the details (Larry?,
    Bob?).  I would guess a problem with portability of code using
    any such functions.  In Bob Kerns paper, he mentions (p 17)
    implementations may dynamically load character-sets and
    assign character codes on an as-need basis.  In the IBM
    proposal, we suggested char-split and char-join for decoding
    and encoding respectively.

    Another suggestion (via LUCID), was to replace char-split
    with two functions:
             (char-code-index  char-code) which takes a character code
                and returns the index and
             (char-code-set  char-code) which returns the character set.

6)  Make-string has a new agument, :element-type.....

    Right.  The document fails to mention what :element-type
    allows.  I will amend it to say valid values are any
    character type/subtype  (eg.  :element type '(character :standard))



Your final point on a lack of good examples is correct.  Perhaps
you can get with Larry (who volunteered for examples!) and
formulate some for insertion into the doc.  Any and all examples
from anyone are welcome!



------------------------------------------

  I intend to update the document to reflect Pauls comments #4 and #6.
  I will wait on #5 until a) people jog my memory on why not and
  b) Paul makes a specific proposal.  #1-3 I will leave as documented
  currently if nobody objects strongly (ie. makes a specific proposal).

  Also, I will change all the document references to 'deleted'
  paragraphs of CLtL to include the first ten words of the paragraph.
  This should ease the burden of the reader counting paragraphs.

  Any additional changes (eg. example insertions) should be provided
  in the next two weeks so we can make our end of month deadline
  for distribution.

--------------------------------------------

  Our voting time is here!

  Note that possibility of further changes are becomming less likely
  with the deadline comming near. Minor editorial changes
  can be made throughout August but major suggestions are unlikely
  to make the document  (eg. rework this section, etc.) unless
  you provide the work immediately!

    So, by August 15, please place your vote on the subcommittee
  forum on sail.  The vote issue is:  SHOULD THE DOCUMENT AS IT STANDS,
  baring the updates mentioned above and any other minor editorial
  changes, BE RELEASED TO X3J13 ON 31 AUGUST.

   We are quite informal so YES votes with notations
  are encouraged  (eg. I vote YES  but want the additions:  xxx
                                                changes:  xxxx
                                                deletions: xxxx
                                   and have comments: xxxx)

   NO votes MUST be accompanied with notations:
                  (eg. I vote NO but would vote YES if additions:xxxx
                                                       changes:  xxxx
                                                       deletions: xxxx
                                 and have comments: xxxx)

  If a simple majority vote is in favor the document WILL BE RELEASED
  and I will request subvotes on any addition,change,deletion
  (sub)proposals which do not simply amplify the existing document.

  If not, the document WILL NOT BE RELEASED unless another vote
  is taken.

  No vote is considered an ABSTENTION.



  I should clarify that IF RELEASED, the document is still subject
  to change by: suggestions/changes from X3J13 and by futher
  amplification by our subcommittee (eg. at the October meeting).  The
  schedule I am following is:

          31 August 88    -----  release document to x3j13
          12 October 88   -----  discussion and vote by x3j13
          mid November 88 -----  final modifications made per
                                     X3J13 and subcommittee
          mid+1 November 88 -----  document to editor
             January   88 -----  ANSI Common LISP draft which
                                   includes character extensions

--------------------------------------------

  Gary and Bob,  please make sure that Mike Beckerle sees a copy
of this note (as I don't believe he is connected yet).

--------------------------------------------

  Any informal participants in this forum are also encouraged to
respond.  votes will be encouraging if YES and discouraging if NO
but won't affect the tally.  ----  comments and specific suggestions
are especially welcome!

--------------------------------------------

Sorry for the long message.  I'm on vacation next week but will be
back on the 15th.

Regards,
  Thom

∂16-Aug-88  0319	CL-Characters-mailer 	document vote
Received: from IBM.COM by SAIL.Stanford.EDU with TCP; 16 Aug 88  03:19:30 PDT
Date: Mon, 15 Aug 88 19:35:24 PDT
From: Thom Linden <baggins@ibm.com>
To: "X3J13: Character Subcommittee" <cl-characters@sail.stanford.edu>
Message-ID: <880815.193524.baggins@IBM.com>
Subject: document vote

I'm back from vacation.  Per my message from 5 August, I would
like to receive your votes on the document as distributed.
So far, your responses have not arrived.  I'm expecting replys from:

      Mike  Beckerle
      Paul  Beiser
      Bob   Kerns
      Kevin Layer
      Larry Masinter

      Gary  Palter ?you joined recently, are you voting ?
      Carl  Hoffman ?haven't heard from you for quite a while,
                     are you voting  ?

  anyone else think they should be on this list?  Also, comments
are invited.


Regards,
  Thom

∂17-Aug-88  0432	CL-Characters-mailer 	document vote
Received: from ucbarpa.Berkeley.EDU by SAIL.Stanford.EDU with TCP; 17 Aug 88  04:32:02 PDT
Received: by ucbarpa.Berkeley.EDU (5.59/1.29)
	id AA16311; Wed, 17 Aug 88 04:30:33 PDT
Received: by franz (3.2/3.14)
	id AA19148; Wed, 17 Aug 88 04:01:50 PDT
Received: by feast (5.5/3.14)
	id AA00254; Tue, 16 Aug 88 22:53:27 EDT
Date: Tue, 16 Aug 88 22:53:27 EDT
From: franz!feast!smh@ucbarpa.Berkeley.EDU (Steven M. Haflich)
Message-Id: <8808170253.AA00254@feast>
To: franz!ibm.com!baggins
Cc: franz!sail.stanford.edu!cl-characters
In-Reply-To: Thom Linden's message of Mon, 15 Aug 88 19:35:24 PDT <880815.193524.baggins@IBM.com>
Subject: document vote

(I've been tracking this list silently.)

FYI, Bob Kerns happens to be out of the country right now and won't be
back for about two weeks.

∂17-Aug-88  0749	CL-Characters-mailer 	forwarding Paul's message   
Received: from IBM.COM by SAIL.Stanford.EDU with TCP; 17 Aug 88  07:49:19 PDT
Date: Tue, 16 Aug 88 14:04:23 PDT
From: Thom Linden <baggins@ibm.com>
To: "X3J13: Character Subcommittee" <cl-characters@sail.stanford.edu>
Message-ID: <880816.140423.baggins@IBM.com>
Subject: forwarding Paul's message

------------------------------------------------------------




Received: from  hpfclp.sde.hp.com by IBM.COM on 08/16/88 at 10:28:10 PDT
Received: from hpfclp.sde.hp.com (hpfclp) by hplabs.HP.COM with SMTP ; Tue, 16 Aug 88 09:27:11 PST
Received: from hpfcpsb.HP.COM by hpfclp.sde.hp.com; Tue, 16 Aug 88 11:25:49 mdt
Received: from hpfcpsb by hpfcpsb.HP.COM; Tue, 16 Aug 88 11:24:48 mdt
To: Thom Linden <baggins@ibm.com>
Subject: Re: Vote
X-Mailer: mh6.5
Date: Tue, 16 Aug 88 11:24:34 MDT
Message-Id: <3505.587755474@hpfcpsb>
From: paul@hpfclp.sde.hp.com


Thom,

Welcome back!

I vote YES, with:

   *) we need more examples. I would suggest that someone with implementation
      experience (maybe Bob Kerns or someone from Lucid) furnish some examples.


Regards,

Paul

P.S. I will be at AAAI Aug 20-26th, and then I am on vacation until Sept 8.

∂17-Aug-88  1851	CL-Characters-mailer 	request for comments to X3J13 subcommittee proposal  
Received: from lucid.com by SAIL.Stanford.EDU with TCP; 17 Aug 88  18:51:31 PDT
Received: from rainbow-warrior ([192.9.200.16]) by heavens-gate id AA03609g; Wed, 17 Aug 88 15:35:34 PST
Received: by rainbow-warrior id AA27892g; Wed, 17 Aug 88 16:33:59 PDT
Date: Wed, 17 Aug 88 16:33:59 PDT
From: Dave Unietis <dru@lucid.com>
Message-Id: <8808172333.AA27892@rainbow-warrior>
To: cl-characters@sail.stanford.edu
Subject: request for comments to X3J13 subcommittee proposal
Cc: dru@lucid.com

By way of introduction, I work at Lucid Inc., where I am involved in 
adding DBCS character support to Lucid Common Lisp. 

The following is our response to a request for comments on the latest 
draft of the X3J13 character subcommittee proposal.
Although these comments are quite lengthy, and do raise several issues that
we feel merit further examination, I should say up front that we are in 
general agreement with most of the substance of the current proposal draft,
and appreciate the effort of Thom Linden and the character subcommittee
towards this standardization effort.

In rough order of importance:

Simple-strings

The type simple-string should not be eliminated.  One of the tenets
of the JEIDA proposal, reinforced in our discussions with the Japanese,
is that existing programs that work with characters, string-chars, and 
strings should continue to work unmodified with extended characters and
extended strings. We feel this design consideration to be primary.

In Lucid Common Lisp, SCHAR is used by most existing programs that manipulate
strings, because most of the time strings don't require fill pointers, etc.,
and because SCHAR is optimized by the compiler.  With the proposed elimination 
of simple strings, and redefinition of SCHAR to work only with
simple-base-strings, these programs will have to be recoded to work
with strings containing other than base characters.  

I don't understand why simple strings are considered "ambiguous", as suggested
by the cover letter. A simple string is precisely a string that does not have 
a fill pointer, is not displaced to another string, and may not have its 
size adjusted dynamically after creation.  Simple strings are no more ambiguous
than simple arrays of type T - how the data type is implemented internally is
irrelevant.  

I propose retaining the current definition of simple-string and SCHAR, and
adding a new simple-base-string accessor, SBCHAR, which is defined to operate
on simple-base-strings only.  Someone making use of such a function would 
be explicitly specifying that the string in question contains only base
characters.  The resulting type hierarchy more closely parallels the one
defined in the JEIDA proposal.


Most-general-strings

Given that the type string-char is equivalent to the type character in the
subcommittee document proposal, and given that the type string is defined as 
(vector string-char), and the type most-general-string is defined as
(vector character) (A.2.15, p19), then why aren't the types string and
most-general-string equivalent?  If they are equivalent, then as a type
definition, most-general-string is redundant. If I'm guessing correctly the 
intent of the definition of most-general-string, it is to provide a
declaration that indicates that the string in use is not a 
base-character-only ("thin") string.  We wrestled with the problem of 
providing an adequate definition of such a type, and came to the conclusion 
that the increase in performance such a data type might provide 
did not warrant adding more hair to the array-type gorilla. 


Equivalence classes

Our discussions with Japan indicate that this issue is not going to go away.
In fact, the next draft of the JEIDA proposal, due next month, is rumored
to have recommendations regarding treatment of double-byte "alphabetic"
(i.e. English) characters.

I agree that defining dynamically-modifiable equivalence classes has 
serious flaws, even if the equivalence state is rebindable, among which are
that symbol EQ-ness is not preserved, and that hash keys may be invalidated. 

However, if a character's equivalence class is treated as a static property,
these problems disappear.  That is, a character's equivalence class is defined
to be a property similar to whether or not the character is a graphics, digit,
or uppercase character.  The process of character canonicalization, 
as described in Linden 87, seems no more arbitrary than the current
case-conversion by the reader and case-insensitivity of some of the string 
and character predicates.  

I feel this mechanism should be retained and that equivalence classes should be
defined statically as properties of the character set(s) supported by 
an implementation.


Character code components, character attributes

The latest draft of the proposal seems to be heading in the right 
direction, where it states "The convention by which the character set
index and character set identifier are composed into a single integer code
is implementation dependent."  However, I feel it doesn't go far enough. 
Given that the information from a character's character-set and
character-set-index is captured in its character code, then it is 
unnecessary to elevate these properties to the level of attributes, as 
described in A.13.1 (p 21).  The character set and index of a character
are simply properties, just as whether or not the character is a digit, etc.
are properties.  Given all this, it is unnecessary to define functions to
extract or set these "components" of a character.  As a matter of fact, 
I'm not sure what meaning character-set-index has as a Common Lisp 
construct.  It is not mentioned in any of the other function definitions 
in Appendix A.  I agree that an implementation would be wise to 
document the mapping from a character's external representation to its
character code, but other that than I don't see what else is necessary. 


EXTERNAL-WIDTH, and FORMAT

I feel that an EXTERNAL-WIDTH function (WRITE-WIDTH in the JEIDA proposal)
is necessary.  It is easy to try and write this one off as not part of the
language definition, but I think we are blinded by the fact that in 
most popular English-only character sets it is always true that 
(= width-in-characters width-in-external-code-format-units), and that 
the difficulties when this is not the case are not properly appreciated. 

For example, there is a problem in the way that FORMAT currently interprets 
numeric parameters to directives.  Our original plan was to interpret such 
parameters as meaning number of characters, which would require no change to
the language definition.  The Japanese have convinced us that it is far more
useful to define numeric parameters as meaning the number of bytes required in
the external code format associated with the stream argument to FORMAT. This
allows these directive parameters to be used in producing columnar output, as
long as the width in bytes of the external code format corresponds to the
resulting width of the displayed or printed output, which seems to be the
usual case. At first, we were reluctant to consider introducing an "external" 
meaning to an "internal" function such as FORMAT, but after further
consideration, we decided that FORMAT is the appropriate place for this type
of processing.

There is a problem, however, is deciding what to do when NIL is specified as
the stream argument to FORMAT, particularly when used to produce a string
that will in turn be passed as an argument to a subsequent FORMAT.  
Also, it would be useful to be able to specify that numeric parameters be 
interpreted as number of characters, regardless of the destination stream 
argument. Rather than clutter up the already-tortured definition of FORMAT, 
we suggest adding the following variable:

   *FORMAT-EXTERNAL-WIDTH* - specifies how numeric parameters in a format
   control string are interpreted.  It can have one of the following values:

        T             With this value, FORMAT uses the destination stream type
                      to interpret numeric parameters as external format units 
                      for this type of stream; if the destination stream type
                      is NIL, numeric parameters are interpreted as characters.
                      This value is the default.

        NIL           With this value, FORMAT interprets numeric parameters as
                      characters, regardless of the destination stream type.

        external      If the value is a keyword that specifies an external code
        format        format recognized by the implementation, FORMAT
                      interprets numeric parameters as external format units
                      when the destination stream is NIL.  If the destination 
                      stream type is non-NIL, this value has no effect.

Note that for streams of only base-characters, width in characters = width in 
external format units, and the values T and NIL above are equivalent.  


Printing characters

The main problem here seems to be to decide what to do when extended 
characters are written to a base-character only stream, as existing 
mechanisms are sufficient for unrestricted streams.

In escape mode, characters other than base-characters that are written to
a base-character-only stream could be written using an extended definition 
of char-name, like the one used by Lucid Common Lisp described below. 
This probably isn't general enough to warrant inclusion in the language, 
however, except perhaps to note that all characters may be printed to 
any stream when in escape mode in some implementation-dependent manner.

In non-escape mode, the problem is more difficult.  Given the problems 
in developing a general character-by-character encoding with escape
characters, as suggested earlier by Larry Masinter, I think the right thing 
to do here is just punt and say "It is an error" to write extended 
characters to a base-character-only stream in non-escape mode.

In current implementations of Lucid Common Lisp, all characters may be read 
in the following form: #\cxx, where xx is the character code in hexadecimal.
Cxx is the char-name for all non-printable characters that do not have a more
mnemonic name, and is used when printing these characters in escape mode.
This mechanism could be extended to extended characters by simply adding
hexadecimal digits.  Extended characters could then be read from and
written to base-character-only streams using this syntax.  I don't feel 
that including the character set name in the syntax is necessary, as this
information is not explicitly retained when characters are written to 
unrestricted streams.


Storing extended characters in base strings.

I'm assuming that "this is an error"; if so it needs to be noted
in the appropriate places in CLtL (setf of sbchar, replace, etc.)


Glyphs and repertoires

I guess I just don't understand what this is all about.  After several 
readings of the relevant sections of the proposal, I think I understand the
glyph/character and character set/character repertoire abstractions, but it 
still strikes me as much ado about nothing.  Of course it is possible
for display devices, printers, keyboards, operating systems and window
systems to display, print, input, and/or translate characters in any manner
whatsoever, but what of any of this has anything to do with Common Lisp?

As for Common Lisp itself, of course it shares the same freedom as any other
piece of hardware or software in this regard, and if an implementation chooses
it could interpret the glyph "a" typed by a user as LZ01, I suppose.
At most it seems that the following fact is worth noting:
"An implementation may choose to document idiosyncrasies of the way
some characters are mapped from I/O devices to internal 'graphics symbols'
and still call itself Common Lisp." (the inability of most IBM terminals
to print the character [ comes to mind).

Certainly nobody is proposing that the glyphs used throughout the 
definitions of "all Common Lisp functions, macros, constants, and global
variables" in CLtL be replaced with the corresponding character IDs
from the table of A.2.2.1 (p14).  Of course not, because the glyphs used in
examples in the language definition as well as the glyphs used in any
reasonable implementation had better correspond pretty closely to
the standard glyphs in the table in A.2.2.1.

On the other hand, maybe I just don't understand the issues here.  If so,
I don't think I'll be alone in this regard, so perhaps more motivating 
arguments for the introduction of this terminology should be added to the
documentation.


JEIDA proposal

As I mentioned earlier, I believe that a new draft of the JEIDA proposal
is due sometime in September.  Are there plans for including input from this
source in the final draft that is presented to X3J13 in October?


Typos

Although the definitions have been dropped from this draft of the proposal,
the terms "extended string" and "code point" still occur in several places.



David Unietis 
Lucid, Inc.

∂14-Sep-88  1206	CL-Characters-mailer 	DC meeting   
Received: from IBM.COM by SAIL.Stanford.EDU with TCP; 14 Sep 88  12:05:53 PDT
Date: Wed, 14 Sep 88 11:40:06 PDT
From: Thom Linden <baggins@ibm.com>
To: "X3J13: Character Subcommittee" <cl-characters@sail.stanford.edu>
Message-ID: <880914.114006.baggins@IBM.com>
Subject: DC meeting

  The results of our voting were:

      Linden  --  yes
      Masinter -- no vote received
      Beckerle -- yes
      Kerns    -- no vote received (told he is unavailable in Japan)
      Beiser   -- yes
      Layer    -- yes

  Thus, we will distribute the document to X3J13.  I am finishing
  some editorial modifications and will incorporate many of the
  comments received (these will be discussed
  in a separate note and at DC).

----------------------------------------

  I discovered Carl Hoffman is no longer at ILA  ..  since he
  hasn't been active on the subcommittee, and hasn't seen the
  proposal (to my knowledge)  I won't list him on the front.

----------------------------------------

  We also received significant comments from LUCID, and general
  agreement with the proposal.

----------------------------------------

  I would like to hold an all day meeting on Monday 10 Oct prior
to the next X3J13 meeting.  I'll get back with the location and
precise times (circa 9 to 5).

  The subject of the meeting is to discuss any and all points on
the CS proposal.



Regards,
  Thom

∂14-Sep-88  1403	CL-Characters-mailer 	DC meeting   
Received: from IBM.COM by SAIL.Stanford.EDU with TCP; 14 Sep 88  14:02:52 PDT
Date: Wed, 14 Sep 88 12:02:37 PDT
From: Thom Linden <baggins@ibm.com>
To: "Robert F. Mathis" <mathis@a.isi.edu>
cc: Jan Zubkoff <edsel!jlz@labrea.stanford.edu>,
    "X3J13: Character Subcommittee" <cl-characters@sail.stanford.edu>
Message-ID: <880914.120237.baggins@IBM.com>
Subject: DC meeting

Bob/Jan,
  The Characters subcommittee will be meeting on Monday 10 Oct.
  We need a room from 9:30 to 5pm for 4 to 6 attendees.

  We also have a proposal to submit to the full committee.  I hope to
  have the final revisions completed today.  Please let me know how
  you would like it distributed (vnet to you, mail to you, both?)
  It is written using only a few facilities of LaTex (esp. tabular)
  If you send me a mailing list, I would also be able to
  distribute printed copies (as early as tomorrow).


  I would expect this proposal to take at least 3 hours of full
  committee time (with about 1 hr subcommittee review at the start).


  I wish to have a position vote by the full committee at the DC
  DC meeting on the following:


    1a) accept, future revisions to be handled by editorial subcommittee
 or 1b) accept, with direction for revisions in specific sections,
          specific sections to be revised by the characters subcommittee

    2) submit (stipulating 1a or 1b) to ISO at their November meeting


Regards,
  Thom

∂18-Sep-88  1649	CL-Characters-mailer 	DC meeting   
Received: from AI.AI.MIT.EDU by SAIL.Stanford.EDU with TCP; 18 Sep 88  16:49:20 PDT
Date: Sun, 18 Sep 88 19:54:41 EDT
From: "Robert W. Kerns" <RWK@AI.AI.MIT.EDU>
Subject:  DC meeting
To: baggins@IBM.COM
cc: cl-characters@SAIL.STANFORD.EDU
Message-ID: <445848.880918.RWK@AI.AI.MIT.EDU>

    Date: Wed, 14 Sep 88 11:40:06 PDT
    From: Thom Linden <baggins at ibm.com>
      The results of our voting were:
          Linden  --  yes
          Masinter -- no vote received
          Beckerle -- yes
          Kerns    -- no vote received (told he is unavailable in Japan)
Here I am. Actually, I've been back for about three weeks, but it's
taken a while to get my modem hooked up again, since I moved my Mac.
I plan to make other arrangements for mail shortly, anyway.

If someone could please send me a copy of the document, as either Ascii text or
Microsoft word format, on either IBM 360K or 1.2M 5.25" floppies or Mac
floppies, I'll see about getting you comments ASAP. Thanks.

          Beiser   -- yes
          Layer    -- yes

      Thus, we will distribute the document to X3J13.  I am finishing
      some editorial modifications and will incorporate many of the
      comments received (these will be discussed
      in a separate note and at DC).

    ----------------------------------------

      I discovered Carl Hoffman is no longer at ILA  ..  since he
      hasn't been active on the subcommittee, and hasn't seen the
      proposal (to my knowledge)  I won't list him on the front.
He spends most of his time in Japan these days, but is here in
the US at the moment. If you'll get me a copy I'll get a copy to
him, if he's still interested.

    ----------------------------------------

      We also received significant comments from LUCID, and general
      agreement with the proposal.

    ----------------------------------------

      I would like to hold an all day meeting on Monday 10 Oct prior
    to the next X3J13 meeting.  I'll get back with the location and
    precise times (circa 9 to 5).

      The subject of the meeting is to discuss any and all points on
    the CS proposal.



    Regards,
      Thom

∂23-Sep-88  1038	CL-Characters-mailer 	october meeting note   
Received: from IBM.COM by SAIL.Stanford.EDU with TCP; 23 Sep 88  10:38:41 PDT
Date: Fri, 23 Sep 88 09:53:06 PDT
From: Thom Linden <baggins@ibm.com>
To: "X3J13: Character Subcommittee" <cl-characters@sail.stanford.edu>
Message-ID: <880923.095306.baggins@IBM.com>
Subject: october meeting note

I'm mailing the following note along with the proposal today.  I'll
also send the LaTex form of the proposal to cl-characters (split
into parts due to postal problems).

Regards,
  Thom

-----------------------------------------------------------------

   The Characters subcommittee proposal for extending Common LISP
to support multiple and large character sets is a topic for
discussion and vote at the Washington D.C. meeting in October.

   I have included a copy of the proposal for your review.  I would
encourage editorial comments and minor corrections be sent directly
to me at the address above or via
csnet to cl-characters@sail.stanford.edu.  Other review comments
may be sent to common-lisp@sail.stanford.edu or stated at the
October meeting.

   The characters subcommittee is requesting the following
position votes by X3J13 at the Washington D.C. meeting:

  1a) Accept for inclusion in the draft standard.
  1b) Accept conditionally with
specific revisions to be incorporated by the characters subcommittee.
  2) Submit the proposal (stipulating 1a or 1b) to
ISO WG16 at their November meeting.

∂23-Sep-88  1040	CL-Characters-mailer 	proposal part 1   
Received: from IBM.COM by SAIL.Stanford.EDU with TCP; 23 Sep 88  10:38:56 PDT
Date: Fri, 23 Sep 88 09:55:21 PDT
From: Thom Linden <baggins@ibm.com>
To: "X3J13: Character Subcommittee" <cl-characters@sail.stanford.edu>
Message-ID: <880923.095521.baggins@IBM.com>
Subject: proposal part 1


\documentstyle{report}     % Specifies the document style.

\pagestyle{headings}

\title{\bf DRAFT:
Extensions to Common LISP to Support International
Character Sets}
\author{
Michael Beckerle\thanks{Gold Hill Computers} \and
Paul Beiser\thanks{Hewlett-Packard} \and
Robert Kerns\thanks{Independent consultant} \and
Kevin Layer\thanks{Franz, Inc.} \and
Thom Linden\thanks{IBM Research, Subcommittee Chair} \and
Larry Masinter\thanks{XEROX Research}
}
\date{Sept 9, 1988}   % Deleting this command produces today's date.

\begin{document}

\maketitle                 % Produces the title.

\setcounter{secnumdepth}{4}

\setcounter{tocdepth}{4}
\tableofcontents


%----------------------------------------------------------------------
%----------------------------------------------------------------------
\newfont{\cltxt}{cmr10}
\newfont{\clkwd}{cmtt10}

\newcommand{\apostrophe}{\clkwd '}
\newcommand{\bq}{\clkwd\symbol{'22}}


%----------------------------------------------------------------------
%----------------------------------------------------------------------
\chapter{Introduction}

This is a proposal for both extending and modifying the Common LISP
language definition to provide a standard basis for Common LISP
support of the variety of character sets used to represent the
native languages of the international community.

This proposal was created by the Character Subcommittee of X3 J13.
We would like to acknowledge discussions with T. Yuasa and other
members of the JEIDA Technical Working Group,
comments on earlier versions of this proposal by David Unietis at
LUCID Inc.,
the JEIDA proposal \cite{ida87}
as well as the
proposals \cite{linden87} and \cite{kerns87} for
providing the initial motivation and direction for these extensions.
As all these documents and discussions were
expressly for Common LISP standardization usage,
we have borrowed freely from their ideas as well as the texts
themselves.

This document is separated into two parts. The first part explains the
major language changes and their motivations.  The second part,
Appendix A, provides
the page by page set of editorial changes to \cite{steele84}.

\section{Objectives}

The major objectives of this proposal are:
\begin{itemize}
\item To provide a consistent, well-defined scheme allowing support
of both very large character sets and multiple character sets.

Many native
languages, such as Japanese and Chinese, use character
sets which contain more characters than the Roman alphabet.
Supporting larger sized character sets frequently means employing
larger data fields to uniquely encode each
character.
Common LISP implementations using
larger sized character sets
can
incur performance penalties in terms
of space, time, or both.

Many software applications are intended for international use, or
have requirements for incorporation of language elements of multiple
native
languages within a single application.
In order
to ensure some portability of these applications, data expressed in
a mixture of
native
languages must be treated consistently by the
software language.

\item To ensure efficient performance of string and character
operations.

The use of large and/or multiple character sets by an implementation
implies the need for a more complex character type representation.
Given a more complex character representation, the efficiency
of language operations on characters (e.g. string operations)
could be affected.

\item To assure forward compatibility of the proposed model
and definition with existing Common LISP implementations.

Developers should not be required to re-write large amounts of either
LISP code or data representations in order to apply the proposed
changes to existing implementations.
The proposed changes should provide an easy
portability path for existing code to many possible implementations.
\end{itemize}
%----------------------------------------------------------------------
%----------------------------------------------------------------------
%----------------------------------------------------------------------
%----------------------------------------------------------------------
\chapter{Overview}

We use several terms within this document which
are new in the context of Common LISP.
Definitions for the following prominent
terms are provided for the reader's convenience.

A {\em character repertoire} defines a collection of characters
independent of their specific rendered image or font.  Character
repertoires are specified independent of coding and their characters
are only identified with a unique label, a graphic symbol, and
a character description.
Once defined, a character repertoire must be
{\em encoded} to allow a one-to-one mapping between a character
and a number that serves as the character code.  An encoded repertoire
is called a {\em coded character set}.

In Common LISP a {\em character} data object is identified by its
{\em character code}, a unique numerical code identification.
Each character code is composed from
a {\em character set identifier},
shared by all characters of a particular character
set, and a {\em character set index}, a numerical identification which
is unique within a particular character set.

Character data objects which are classified as {\em graphic},
or displayable, are each associated with a {\em glyph}.  The
glyph is the visual representation of the character.

The primary purpose of introducing these terms is to provide a
consistent naming to Common LISP concepts which are identical
to those found in ISO standardization of coded
character sets.  They also serve as a demarkation between these
standardization activities.  For example, while Common LISP is free to
define unique repertoires and facilities to manipulate them, it should
not define character encodings.

%----------------------------------------------------------------------
\section{Character Identity}


Characters are uniquely distinguished by their codes,
which are drawn from the set of
non-negative integers.

It is important to separate the notion of glyph from the notion of
character data object when defining a scheme under which issues of
identity can be rigorously decided by a computer language.  Glyphs are
the visual aspects of characters, writable on surfaces, and sometimes
called 'graphics'.  A language specification valid for more than a
narrow range of systems can only make assumptions about the existence
of {\em abstract} glyphs (for example, the Latin letter A) and not about
glyph variants (for example, the italicized Latin letter {\em A})
\footnote{the later are often referred to as {\em designer} glyphs}
or characteristics of display devices.  Thus, a key element of this
proposal is the removal of the {\em font} and {\em bits}
attributes from the language specification.\footnote{These and other
attributes may still be supported by an implementation but they
are extensions which do not affect the
{\clkwd char-equal} identity of the character
object.}

Character codes are composed from a character set identifier and a
character set index.
Within a given character set, individual member
characters are distinguished by character set index.
\footnote{
We specifically do not propose any standard encoding for
any character repertoires.
}
An implementation need
not support more than one character set, the {\em base} character set.
If it does support multiple
character sets, it must define the sets supported and
their characteristics.  Character set identifiers are assigned to
character sets by the implementation.
\footnote{
We do not propose any standard character set
identifiers but names such as {\clkwd :ISO8859-1988} come to mind.}
The convention by which the character set index
and character set identifier are composed into a single integer code
is implementation dependent.
Characters within the base character set are referred to as
{\em base characters}.  Characters not in the base character set
are referred to as {\em extended characters}.

One ramification is that the distinction between {\clkwd string-char}
and {\clkwd character} is eliminated.  {\bf All} characters can be
inserted into (type compatible) strings.
For compatibility, {\clkwd string-char}
is defined as equivalent to {\clkwd character}.  All functions
dealing with the {\em bits} and {\em font} attributes are either
removed or modified by this proposal.

A second ramification
is that the {\clkwd characterp} predicate is extended to
support testing
membership of a character in a given character repertoire
or subrepertoire.
\footnote{
For example,
testing membership in the Kanji subrepertoire.
}

A third ramification is that I/O functions must be modified to manage
the interaction between the Common LISP treatment of character sets and
the external environment.

The
intent of the provision for multiple character sets
is that
native
language glyph sets (with associated digits and
punctuation)
\footnote{For example, the glyphs on the keycaps of a particular
terminal, or any other glyph sets with a common use in graphics or
symbolic communication.
}
supported by user display
hardware should each be mapped by the I/O interface
into its own character set inside
LISP, all the members of which
share a common character set identifier.
\footnote{Of course, an implementation would be free to decide if and
how supported glyphs should be differentiated into sets.
}
Which glyph sets are supported by the overall computing system, the
details of the mapping of
glyphs to character set indices, and the particular character set
identifiers used, are left unspecified by Common LISP.

The diversity of glyph sets and character
encoding conventions in use worldwide and the desirability
of allowing LISP to manipulate symbolic elements from many
languages, perhaps simultaneously, mandate such a flexible approach.

%----------------------------------------------------------------------
\section{Hierarchy of Types}


A Common LISP
implementation is required to support at least one character
repertoire: the {\em base character repertoire}.
The base character repertoire
is distinguished from every other supported character repertoire in
several respects:
\begin{itemize}
\item
The standard characters are a subrepertoire of the base characters.
\item
Only members of the base character repertoire
can be elements of a base string.
\item
The base characters are, in general, the default characters for I/O
operations.
\end{itemize}
No upper bound is specified for the number of glyphs in the base
character repertoire--that
is implementation dependent.  The lower bound is 96, the
number of standard characters defined for Common LISP.
We use the term {\em extended} to describe character repertoires beyond
the base repertoire.

The following type specifier is added as a subtype
of {\clkwd character}.
\begin{itemize}
\item {\clkwd base-character}
\end{itemize}

An implementation may support additional subtypes of {\clkwd character}
which may or may not be supertypes of {\clkwd base-character}.


The distinction of a base character set is largely a pragmatic
choice.  It permits efficient handling of common situations, is
in some sense privileged for host system I/O, and can serve as an
intermediate basis for portability, less general than the standard
characters, but possibly more useful across a narrower range of
implementations.

Most computers have some "natural" character representation which
is a function of hardware instructions for dealing with characters,
as well as the organization of the file system.  The natural character
representation is likely to be the smallest transaction unit permitted
for text file and terminal I/O operations.  On a system with a record
based I/O paradigm, the natural character representation is likely to
be the smallest record quantum.  On many computer systems,
this representation is a byte.

However, there are often multiple character sets supportable on a
computer, through the use of special display and entry hardware, which
are varying interpretations of the basic system character
representation.  For example, EBCDIC and extended ASCII are two
different interpretations of the same 1-byte code representations.
Many countries have their own glyph-to-code mappings for 1-byte
character codes addressing the special requirements of national
languages.  Differentiating between these sets, without reference to
display hardware, is a matter of convention, since they all use the
same set of code representations.  When a single byte is not enough,
two or more bytes are sometimes used for character encoding.  This
makes character handling even more difficult on machines where the
natural representation size is a byte, since not only is the semantic
value of a character code a matter of convention, which may vary
within the same computing system, but so is the identification of a
set of bits as a complete character code.

It is the intention of this proposal that the base character set of
Common LISP
be the natural characters of the host system: its composition
should be
determined by the code capacity of the natural file system and I/O
transaction representations, and its assumed display glyphs should be
those of the terminals most commonly employed.
There are several advantages to this scheme.  Internal representation
of strings of just base characters can be more compact than
strings including extended characters.
Source programs are likely to consist predominantly of base characters
since the standard characters are a subset of the base character
repertoire. Parsing of pure base character text
can be more efficient than parsing of text including
extended characters.
I/O can be performed more simply
with base characters,
and they can be used as a basis for data representations to
be shared with other LISP sessions with potentially different
character set definitions or non-LISP processes.

{\em Implementation note}:
Although the readtable must be capable of
holding syntax information for all characters, the data
structure(s) used internally for the readtable may be segmented
into a section for each defined character set.  Access for
base character syntax during the parsing of base strings may
be quicker than the general case since the table section is the
same for all component characters, and entries may be accessed
directly by character set index.

The standard characters are the 96 characters used in the Common LISP
definition {\bf or their equivalents}.

This was the Common LISP \cite{steele84} definition, but
{\em equivalents} is a vague term.

The standard characters are not defined by their glyphs, but by their
roles within the language.  There are two aspects to the roles of the
standard characters: one is their role in reader and format control
string syntax; the second is their role as components of the names of
all Common LISP
functions, macros, constants, and global variables.  As
long as an implementation chooses 96 characters
and treats those 96 in a manner consistent with
the language's specification for the standard characters (e.g.
the naming of functions), it doesn't matter what glyphs the I/O
hardware uses to represent those characters: they are the standard
characters.  Any program or
data text written wholly in those characters
is portable through simple code conversion.

A mechanism, such as in \cite{linden87}, which supports establishment of
equivalency between distinct characters is not excluded by
of this proposal.
\footnote{But, as with the font character attribute,
is not a mechanism standardized by the ANSI Common LISP definition.}
In general, the authors of this proposal favor the alternative
of ISO standardization of non-overlapping
coded character sets.\footnote{Given the difficulties inherent in the
international standardization process, this may not be a
realistic alternative.}

The {\clkwd string} type
is defined as
a vector of characters.  More precisely, a string
is a specialized vector whose elements are of type
{\clkwd character} or a subtype of character.  There are three strings
distinguished with standardized names: {\em base-string},
{\em most-general-string}, and {\em simple-base-string}.
All strings which are not base strings
are referred to as {\em extended strings}.

A base string can only contain base characters.  A
{\clkwd most-general-string}
can contain any implementation supported base or extended characters,
in any mixture.
All Common LISP functions defined to operate on strings operate
consistently on base strings and extended strings with the following
caveat: for any function which inserts a character into a string, it
is an error to insert an extended character
into a base string.

An implementation may support string subtypes more general
than {\clkwd base-string} but more specialized than
{\clkwd most-general-string}.
For example, a hypothetical
implementation supporting Korean and Russian repetoires
might provide:
\begin{itemize}
\item {\clkwd most-general-string} -- may contain Korean, Cyrillic or
base characters in any mixture.
\item {\clkwd region-specialized-string} -- may contain installation
selected repetoire (Korean/Cyrillic) or base characters in any
mixture.
\item {\clkwd base-string} -- may contain base characters
\end{itemize}
Though, clearly, portability of applications using
{\clkwd region-specialized-string} is limited, a performance
advantage might argue for its use.

Alternatively,
an implementation may define {\clkwd most-general-string}
as equivalent to {\clkwd base-string} and {\clkwd base-character}
as equivalent to {\clkwd character} in a host environment
supporting a large base character repetoire
including, say, Korean, Cyrillic and Latin
subrepetoires.

The {\clkwd coerce} function is extended to
allow for explicit coercion between base strings and extended strings.

During reader
construction of symbols, if all the characters
in the symbol's name are of type {\clkwd base-character},
then the name of the symbol will be stored as a base string.
Otherwise it will be stored as an extended string.

The base string type allows for more compact representation of strings
of base characters, which are likely to predominate in any system.
Note that in any particular implementation the base character set
need not be the
most compactly representable character set, since another might have
a smaller repetoire.
However, in most implementations base strings are
likely to be more space efficient than extended strings.

It has been suggested that either a single string type is
sufficient for large character set Common LISP implementations,
or that a hierarchy of string types could be used, in a manner
transparent to the user.  A desire to flexibly support many different
character sets without compromising the efficiency of ordinary
applications led us to accept the need for more than one string type.
We believe that these choices reflect a minimal
modification of this aspect of the type system, and that
exposing the string types for user programs to negotiate in their own
way is the most reasonable approach.


%----------------------------------------------------------------------
\section{Streams and System I/O}

A lot of the work of ensuring that a
Common LISP implementation operates
correctly in a multiple character set environment must be performed by
the I/O interface.
The system I/O interface, abstracted in
Common LISP as streams, is responsible
for ensuring that text input from outside LISP is properly mapped
into character sets internally, and that the inverse mapping
\footnote{Such an inverse may not exist.
An implementation might legally fold multiple
external character sets into a single internal set on input
(e.g. EBCDIC and ASCII).
}
is performed on output.  It is beyond the scope of a language
definition to specify the details of this operation, but options
are specified which allow runtime indication from the user as to
what character sets a stream uses, and how the mappings
should be done.  It is expected that implementations will provide
reasonable defaults and invocation options to accommodate desired use
at an installation.

Two keyword arguments are proposed as additions to {\clkwd open}:
\begin{itemize}
\item {\clkwd :character-set}
whose value would be:
\begin{itemize}
\item A name or list of names of
defined character sets in the form of keywords.
The default is the base character set when
{\clkwd :external-code-format} is also defaulted.  If a non-default
value is specified for {\clkwd :external-code-format}, there may be a
different default for {\clkwd :character-set}.
\end{itemize}
\item {\clkwd :external-code-format}
whose value would be:
\begin{itemize}
\item
A keyword indicating an implementation recognized scheme for
representing 1 or more character sets with non-homogeneous codes.
\footnote{
For example, the SO/SI SBCS/DBCS convention used by IBM on 370
machines could be selected by a keyword like
{\clkwd :shift-delimited}.
The compact run-encoding convention defined by XEROX could be
selected by {\clkwd :run-encoded}.
The SBCS/DBCS convention based on
ASCII which uses leading bit patterns to distinguish two-byte codes
from one-byte codes could be selected by a keyword like
{\clkwd :high-byte-delimited}.
}
The default is the natural system character representation,
the base character representation.
As many {\clkwd :character-set} names must be provided as the
implementation requires for that external coding convention.
\footnote{
For example, if {\clkwd :shift-delimited} were the
{\clkwd :external-code-format} argument, two character set specifiers
would have to be provided.
}
\end{itemize}
\end{itemize}

These arguments are provided for input, output, and
bidirectional streams.  All characters read from the streams will be
members of the character sets specified by the {\clkwd :character-set}
argument.  It is an error to try to write a character other than a
member of
the specified sets to a stream.  (This includes the
\#$\backslash${\clkwd Newline} character.
Implementations should provide for appropriate line division behavior
through the function {\clkwd terpri}.)

An implementation supporting multiple character sets
must allow for the external and
internal representation of characters to be separately (and perhaps
multiply) specified to {\clkwd open},
since there can be circumstances under
which more than one external representation for an internal character
set is in use, or more than one character set is mixed together in an
external representation convention.

In addition to supporting conversion at the system interface, the
language must allow user programs to determine how much space data
objects will require when output in whichever external representations
are available.

The new function {\clkwd external-width} takes a character object
or string as its required argument.  It also takes an optional
{\em output-stream}.
It returns the number of host system character
representation quantum units
\footnote{
Same as the storage width of a base character, usually a byte.
}
required to externally store that object, using the indicated
representation convention.  If the item cannot be represented in
that convention, the function returns {\clkwd nil}.
This function is necessary
to determine if internal strings can be written to fixed length
fields in databases or terminal screen templates.  Note that this
function addresses the problem of storage width, and does not
address the problem of display width, which may involve calculating
screen width of strings printed in proportional fonts.

A new global variable {\clkwd *format-external-width*} is
introduced to direct
the {\clkwd format} function to
take the {\clkwd external-code-format} of the associated
stream argument into account.  This allows the directive parameters
to be used in producing columnar output, as long as the width
in bytes of the external code format corresponds to the
resulting width of the displayed or printed output.

%----------------------------------------------------------------------

∂23-Sep-88  1044	CL-Characters-mailer 	proposal part 2   
Received: from IBM.COM by SAIL.Stanford.EDU with TCP; 23 Sep 88  10:40:28 PDT
Date: Fri, 23 Sep 88 09:56:04 PDT
From: Thom Linden <baggins@ibm.com>
To: "X3J13: Character Subcommittee" <cl-characters@sail.stanford.edu>
Message-ID: <880923.095604.baggins@IBM.com>
Subject: proposal part 2



%----------------------------------------------------------------------

\newcommand{\edithead}{\begin{tabular}{l p{3.95in}}
  \multicolumn{2}{l} }

\newcommand{\csdag}{\bf$\Rightarrow$\ddag}

\newcommand{\editstart}{}

\newcommand{\editend}{\\ & \end{tabular}}

%----------------------------------------------------------------------
%----------------------------------------------------------------------
\appendix
\chapter{Editorial Modifications to CLtL}

The following sections specify the editorial changes needed in
CLtL to support the proposal.  Section/subsection numbers and titles
match those found in \cite{steele84}.  The notation
{\csdag x} denotes a reference to paragraph x within the
subsection (we count each individual example or metastatement
as 1 paragraph of text).  When an entire paragraph is deleted,
the first few words of the paragraph is noted as an aid in
identifying the text location.


%----------------------------------------------------------------------
\setcounter{section}{1}
\section{Data Types}                        % 2
%----------------------------------------------------------------------


\edithead {\csdag 8}
\editstart
\\ \bf replace &
\cltxt
   rich character set, including ways to represent characters of various
   type styles.
\\ \bf with &
\cltxt
   rich character repertoire.
\editend

\setcounter{subsection}{1}
\subsection{Characters}                     % 2.2.

\edithead {\csdag 1}
\editstart
\\ \bf replace &
\cltxt
  Characters are represented as data objects of type {\clkwd character}.
  There are two subtypes of interest, called
  {\clkwd standard-char} and {\clkwd string-char}.
\\ \bf with &
\cltxt
  Characters are represented as data objects of type
  {\clkwd character}.
\editend
\\
\edithead {\csdag 2}
\editstart
\\ \bf replace &
\cltxt
  This works well enough for printing characters. Non-printing
  characters
\\ \bf with &
\cltxt
  This works well enough for graphic characters.  Non-graphic
  characters
\editend

\subsubsection{Standard Characters}         % 2.2.1.

\edithead {\csdag 0 section heading}
\editstart
\\ \bf replace &
\cltxt
  Standard Characters
\\ \bf with &
\cltxt
  Base Characters
\editend
\\
\edithead {\csdag 1 before}
\editstart
\\ \bf insert &
\cltxt
  Most computers have some "base" character representation which
  is a function
  of hardware instructions for dealing with characters, as well as
  the organization of
  the file system.  This base character representation is likely
  to be the smallest
  transaction unit permitted for text stream I/O operations.
  The base character representation (often a byte) supports an
  implementation specific
  {\em coded base character set} such as the ASCII and the EBCDIC
  coded character sets.
  The {\em base character repertoire} is defined as
  the collection of characters
  contained in the coded base character set.  Common LISP does
  not define the base
  character encoding
  but does require all implementations to support a "standard"
  {\em subrepertoire} of the base character
  repertoire.
\editend
\\
\edithead {\csdag 1 before}
\editstart
\\ \bf insert &
\cltxt
  The {\clkwd base-character} type is defined as a subtype of
  {\clkwd character}.  A {\clkwd base-character}
  object can contain any member of the base character repertoire.
  Objects of type
  {\clkwd (and character (not base-character))} are referred to
  as {\em extended characters}.
\editend
\\
\edithead {\csdag 1}
\editstart
\\ \bf delete &
\cltxt
  Common LISP defines a "standard character set" ...
\editend
\\
\edithead {\csdag 1}
\editstart
\\ \bf new &
\cltxt
  As a subset of the base character repertoire,
  Common LISP defines a standard character
  subrepertoire for two purposes.
  Common LISP programs that are written in the
  standard character subrepertoire
  can be read by any Common LISP implementation; and Common LISP
  programs
  that use only standard characters as data objects are most likely
  to be portable.
  The standard characters are not defined by their glyphs, but by their
  roles within
  the language.  There are two aspects to the roles of the
  standard characters:
  one is their role in reader and format control
  string syntax; the second is their role as
  components of the names of all Common LISP
  functions, macros, constants, and global
  variables.  As long as an implementation chooses 96 glyphs
  and treats those 96 in a manner
  consistent with the language's specification for the standard characters
  (for example,
  the naming of functions),
  it doesn't matter what glyphs the I/O
  hardware uses to
  represent those characters: they are
  the standard characters.  Any program or
  data text written wholly
  in those characters
  is portable through simple code conversion.
  The Common LISP
  standard character subrepertoire
  consists of a space character \#$\backslash${\clkwd Space}, a newline
  \#$\backslash${\clkwd Newline}, and the
  following ninety-four graphic characters or their equivalents:
\editend
\\
\edithead {\csdag 2}
\editstart
\\ \bf delete &
\cltxt
  ! " \# ...
\editend
\\
\edithead {\csdag 2 new}
\editstart
\\ &
  {\bf Common LISP Standard Character Subrepertoire}
\editend
\footnote{\cltxt \#$\backslash${\clkwd Space}
and \#$\backslash${\clkwd Newline} are omitted.
Graphic identifiers and descriptions are from ISO 6937/2.}
\\
{\small \begin{tabular}{||l|c|l||l|c|l||}    \hline
  ID     &    Glyph    &  Name or description
& ID     &    Glyph    &  Name or description
\\ \hline
  LA01  &  a  &  small a
& ND01  &  1  &  digit 1
\\ \hline
  LA02  &  A  &  capital A
& ND02  &  2  &  digit 2
\\ \hline
  LB01  &  b  &  small b
& ND03  &  3  &  digit 3
\\ \hline
  LB02  &  B  &  capital B
& ND04  &  4  &  digit 4
\\ \hline
  LC01  &  c  &  small c
& ND05  &  5  &  digit 5
\\ \hline
  LC02  &  C  &  capital C
& ND06  &  6  &  digit 6
\\ \hline
  LD01  &  d  &  small d
& ND07  &  7  &  digit 7
\\ \hline
  LD02  &  d  &  capital D
& ND08  &  8  &  digit 8
\\ \hline
  LE01  &  e  &  small e
& ND09  &  9  &  digit 9
\\ \hline
  LE02  &  E  &  capital E
& ND00  &  0  &  digit 0
\\ \hline
  LF01  &  f  &  small f
& SC03  &  \$    &  dollar sign
\\ \hline
  LF02  &  F  &  capital F
& SP02  &  !     &  exclamation mark
\\ \hline
  LG01  &  g  &  small g
& SP04  &  "     &  quotation mark
\\ \hline
  LG02  &  G  &  capital G
& SP05  &  \apostrophe     &  apostrophe
\\ \hline
  LH01  &  h  &  small h
& SP06  &  (     &  left parenthesis
\\ \hline
  LH02  &  H  &  capital H
& SP07  &  )     &  right parenthesis
\\ \hline
  LI01  &  i  &  small i
& SP08  &  ,     &  comma
\\ \hline
  LI02  &  I  &  capital I
& SP09  &  \_    &  low line
\\ \hline
  LJ01  &  k  &  small j
& SP10  &  -     &  hyphen or minus sign
\\ \hline
  LJ02  &  K  &  capital J
& SP11  &  .     &  full stop, period
\\ \hline
  LK01  &  k  &  small k
& SP12  &  /     &  solidus
\\ \hline
  LK02  &  K  &  capital K
& SP13  &  :     &  colon
\\ \hline
  LL01  &  l  &  small l
& SP14  &  ;     &  semicolon
\\ \hline
  LL02  &  L  &  capital L
& SP15  &  ?     &  question mark
\\ \hline
  LM01  &  m  &  small m
& SA01  &  +     &  plus sign
\\ \hline
  LM02  &  M  &  capital M
& SA03  &  $<$   &  less-than sign
\\ \hline
  LN01  &  n  &  small n
& SA04  &  =   &  equals sign
\\ \hline
  LN02  &  N  &  capital N
& SA05  &  $>$   &  greater-than sign
\\ \hline
  LO01  &  o  &  small o
& SM01  &  \#    &  number sign
\\ \hline
  LO02  &  O  &  capital O
& SM02  &  \%    &  percent sign
\\ \hline
  LP01  &  p  &  small p
& SM03  &  \&    &  ampersand
\\ \hline
  LP02  &  P  &  capital P
& SM04  &  *     &  asterisk
\\ \hline
  LQ01  &  q  &  small q
& SM05  &  @     &  commercial at
\\ \hline
  LQ02  &  Q  &  capital Q
& SM06  &  [     &  left square bracket
\\ \hline
  LR01  &  r  &  small r
& SM07  &  $\backslash$   &  reverse solidus
\\ \hline
  LR02  &  R  &  capital R
& SM08  &  ]     &  right square bracket
\\ \hline
  LS01  &  s  &  small s
& SM11  &  \}    &  left curly bracket
\\ \hline
  LS02  &  S  &  capital S
& SM13  &  $|$     &  vertical bar
\\ \hline
  LT01  &  t  &  small t
& SM14  &  \}    &  right curly bracket
\\ \hline
  LT02  &  T  &  capital T
& SD13  &  \bq   &  grave accent
\\ \hline
  LU01  &  u  &  small u
& SD15  &  $\hat{ }$  &  circumflex accent
\\ \hline
  LU02  &  U  &  capital U
& SD19  &  $\tilde{ }$ &  tilde
\\ \hline
  LV01  &  v  &  small v
& & &
\\ \hline
  LV22  &  V  &  capital V
& & &
\\ \hline
  LW01  &  w  &  small w
& & &
\\ \hline
  LW02  &  W  &  capital W
& & &
\\ \hline
  LX01  &  x  &  small x
& & &
\\ \hline
  LX22  &  X  &  capital X
& & &
\\ \hline
  LY01  &  y  &  small y
& & &
\\ \hline
  LY02  &  Y  &  capital Y
& & &
\\ \hline
  LZ01  &  z  &  small z
& & &
\\ \hline
  LZ02  &  Z  &  capital Z
& & &
\\
\hline
\end{tabular} }
\\
\edithead {\csdag 3}
\editstart
\\ \bf delete &
\cltxt
  @ A B C...
\editend
\\
\edithead {\csdag 4}
\editstart
\\ \bf delete &
\cltxt
  \bq a b c...
\editend
\\
\edithead {\csdag 5}
\editstart
\\ \bf delete &
\cltxt
  The Common LISP Standard character set is apparently ...
\editend
\\
\edithead {\csdag 6}
\editstart
\\ \bf replace &
\cltxt
  Of the ninety-four non-blank printing characters
\\ \bf with &
\cltxt
  Of the ninety-four graphic characters
\editend
\\
\edithead {\csdag 9}
\editstart
\\ \bf delete &
\cltxt
  The following characters are called ...
\editend
\\
\edithead {\csdag 10}
\editstart
\\ \bf delete &
\cltxt
  {\clkwd \#$\backslash$Backspace \#$\backslash$Tab } ...
\editend
\\
\edithead {\csdag 11}
\editstart
\\ \bf delete &
\cltxt
  Not all implementations of Common ...
\editend

\subsubsection{Line Divisions}              % 2.2.2.
\subsubsection{Non-standard Characters}     % 2.2.3.

\edithead {\csdag delete entire section}
\editstart
\editend

\subsubsection{Character Attributes}        % 2.2.4.

\edithead {\csdag 0 section heading}
\editstart
\\ \bf replace &
\cltxt
  Character Attributes
\\ \bf with &
\cltxt
  Character Identity
\editend
\\
\edithead {\csdag 1 through 8}
\editstart
\\ \bf delete all paragraphs&
\cltxt
  Every object of type {\clkwd character} ...
\editend
\\
\edithead {\csdag 1}
\editstart
\\ \bf new &
\cltxt
A data object of type {\clkwd character} is identified by its
{\em character code}, a unique numerical code identification.
Each character code is composed from
a {\em character set identifier},
shared by all characters of a particular character
set, and a {\em character set index}, a numerical identification which
is unique within a particular character set.
\\ &
An implementation need
not support more than one character set, the {\em base} character set.
If it does support multiple
character sets, it must define the sets supported and
their characteristics.  Character set identifiers are assigned to
character sets by the implementation.
The convention by which the character set index
and character set identifier are composed into a single integer code
is implementation dependent.
\\ &
Characters within the base character set are referred to as
{\em base characters}.  Characters not in the base character set
are referred to as {\em extended characters}.
\\ &
\\ & \bf Compatibility note:  -------------
\\ &
For compatibility with earlier versions of Common LISP incorporating
various attributes of character objects, see 13 for a
discussion of implementation-dependent attributes.
\\ & \bf --------------------------------------------
\editend

\subsubsection{String Characters}           % 2.2.5.

\edithead {\csdag delete entire section}
\editstart
\editend

\subsection{Symbols}                        % 2.3.

\edithead {\csdag 12}
\editstart
\\ \bf replace &
\cltxt
  A symbol may have uppercase letters, lowercase letters, or both
  in its print name.
\\ \bf with &
\cltxt
  A symbol may have characters from any supported character repertoire
  in its print name.
  It may have uppercase letters, lowercase letters, or both.
\editend

\setcounter{subsection}{4}
\subsection{Arrays}
\subsubsection{Vectors}

\edithead {\csdag 6}
\editstart
\\ \bf replace &
\cltxt
  All implementations provide specialized arrays for the cases when
  the components are characters (or rather, a special subset of the
  characters);
\\ \bf with &
\cltxt
  All implementations provide specialized arrays for the cases when
  the components are characters (or optionally, special subsets of
  the characters);
\editend

\subsubsection{Strings}

\edithead {\csdag 1}
\editstart
\\ \bf replace &
\cltxt
  A string is simply a vector of characters.  More precisely, a string
  is a specialized vector whose elements are of type
  {\clkwd string-char}.
\\ \bf with &
\cltxt
  A string is simply a vector of characters.  More precisely, a string
  is a specialized vector whose elements are of type
  {\clkwd character} or a subtype
  of character.
\editend

\setcounter{subsection}{14}
\subsection{Overlap, Inclusion, and Disjointness of Types} % 2.15.

\edithead {\csdag 14}
\editstart
\\ \bf replace &
\cltxt
  The type {\clkwd standard-char} is a subtype of {\clkwd string-char};
  {\clkwd string-char} is a subtype of {\clkwd character}.
\\ \bf with &
\\ & \bf Compatibility note:  -------------
\\ &
\cltxt
  The type {\clkwd standard-char} is a subtype of
  {\clkwd base-character};
  The type {\clkwd string-char} means {\clkwd character}.  Both
  are retained for compatibility with earlier versions of Common LISP.
\\ & \bf --------------------------------------------
\editend
\\
\edithead {\csdag 15}
\editstart
\\ \bf replace &
\cltxt
  The type {\clkwd string} is a subtype of {\clkwd vector},
  for {\clkwd string} means {\clkwd (vector string-char)}.
\\ \bf with &
\cltxt
  The type {\clkwd string} is a subtype of {\clkwd vector},
  {\clkwd string} consists of vectors specialized by subtypes of
  {\clkwd character}.
\editend
\\
\edithead {\csdag 15 after}
\editstart
\\ \bf insert &
\cltxt
  The type {\clkwd base-string} means
  {\clkwd (vector base-character)}.
\editend
\\
\edithead {\csdag 15 after}
\editstart
\\ \bf insert &
\cltxt
  The type {\clkwd most-general-string} means
  {\clkwd (vector character)} and is a subtype of {\clkwd string}.
\editend
\\
\edithead {\csdag 20}
\editstart
\\ \bf replace &
\cltxt
  {\clkwd (simple-array string-char (*))};
\\ \bf with &
\cltxt
  {\clkwd (simple-array character (*))};
\editend
\\
\edithead {\csdag 20 after}
\editstart
\\ \bf insert &
\cltxt
  The type {\clkwd simple-base-string} means
  {\clkwd (simple-array base-character (*))} and
  is the most efficient string which can hold
  the standard character repertoire.
\editend



%----------------------------------------------------------------------
\setcounter{section}{3}
\section{Type Specifiers}                   % 4
%----------------------------------------------------------------------
\setcounter{subsection}{1}
\subsection{Type Specifier Lists} % 4.2.


\edithead {\csdag 8 Table 4-1 (alphabetic list)}
\editstart
\\ \bf remove &
\\ &
\cltxt
  {\clkwd standard-char}
\\ &
  {\clkwd string-char}
\editend
\\
\edithead {\csdag 8 Table 4-1 (alphabetic list)}
\editstart
\\ \bf insert &
\\ &
\cltxt
  {\clkwd base-character}
\\ &
  {\clkwd most-general-string}
\\ &
  {\clkwd simple-base-string}
\editend

\setcounter{subsection}{2}
\subsection{Predicating Type Specifiers} % 4.3.

\edithead {\csdag 2}
\editstart
\\ \bf delete &
\cltxt
  As an example, the entire ...
\editend
\\
\edithead {\csdag 3 delete example}
\editstart
\\ \bf delete &
\cltxt
  {\clkwd (deftype string-char () } ...
\editend

\setcounter{subsection}{5}
\subsection{Type Specifiers That Abbreviate} % 4.6.

\edithead {\csdag 20}
\editstart
\\ \bf replace &
\cltxt
  Means the same as {\clkwd (array string-char ({\em size}))}: the set of
  strings of
  the indicated size.
\\ \bf with &
\cltxt
  Means the union of the vector types specialized by subtypes of
  character
  and the indicated size.
\editend
\\
\edithead {\csdag 23}
\editstart
\\ \bf replace &
\cltxt
  Means the same as {\clkwd (simple-array string-char ({\em size}))}: the
  set of simple strings of the indicated size.
\\ \bf with &
\cltxt
  Means the same as {\clkwd (simple-array character ({\em size}))}: the
  set of simple strings of the indicated size.
\editend
\\
\edithead {\csdag 23 after}
\editstart
\\ \bf insert &
\cltxt
  {\clkwd (base-string {\em size})}
\\ &
  Means the same as {\clkwd (array base-character ({\em size}))}: the
  set of base strings of the indicated size.
\\ &
  {\clkwd (simple-base-string {\em size})}
\\ &
  Means the same as {\clkwd (simple-array base-character ({\em size}))}:
  the set of simple base strings of the indicated size.
\editend

\setcounter{subsection}{7}
\subsection{Type Conversion Function} % 4.8.

\edithead {\csdag 6}
\editstart
\\ \bf replace &
\cltxt
  then the sole element of the print name is returned.
  If {\em object} is an integer {\em n}, then {\clkwd (int-char }
  {\em n}{\clkwd )} is returned.  See {\clkwd character}.
\\ \bf with &
\cltxt
  then the sole element of the print name is returned.
  If {\em object} is an integer {\em n}, then {\clkwd (code-char }
  {\em n}{\clkwd )} is returned.  See {\clkwd character}.
\editend
\\
\edithead {\csdag 6 after}
\editstart
\\ \bf insert &
\begin{itemize}
\cltxt
\item Any string subtype may be converted to any other string
subtype, provided the new string can contain all actual
elements or the old string.  It is an error if it cannot.
\end{itemize}
\editend


%----------------------------------------------------------------------
\setcounter{section}{5}
\section{Predicates}                        % 6
%----------------------------------------------------------------------
\edithead {\csdag 2}
\editstart
\\ \bf replace &
\cltxt
  but {\clkwd standard-char} begets {\clkwd standard-char-p}
\\ \bf with &
\cltxt
  but {\clkwd bit-vector} begets {\clkwd bit-vector-p}
\editend

\setcounter{subsection}{1}
\subsection{Data Type Predicates} % 6.2.

\setcounter{subsubsection}{1}
\subsubsection{Specific Data Type Predicates} % 6.2.2.

\edithead {\csdag 36}
\editstart
\\ \bf replace &
\cltxt
  {\clkwd characterp} {\em object}
\\ \bf with &
\cltxt
  {\clkwd characterp} {\em object} \&{\clkwd optional}
  ({\em repertoire})
\editend
\\
\edithead {\csdag 37}
\editstart
\\ \bf replace &
\cltxt
  {\clkwd characterp} is true if its argument is a character,
  and otherwise is false.
\\ \bf with &
\cltxt
  If {\em repertoire} is omitted, {\clkwd characterp}
  is true if its argument is a character object,
  and otherwise is false.
  If a {\em repertoire} keyword argument is specified,
  {\clkwd characterp} is true if its argument
  is a character object and a member of the specified repertoire
  or subrepertoire, and
  otherwise is false.
  For example, {\clkwd (characterp  \#$\backslash$A}
  {\clkwd :standard)}
  is true since \#$\backslash$A is a member of the standard character
  subrepertoire.
\editend
\\
\edithead {\csdag 38}
\editstart
\\ \bf replace &
\cltxt
  {\clkwd (characterp x) $\equiv$ (typep x \apostrophe character)}
\\ \bf with &
\cltxt
  {\clkwd (characterp x :standard) $\equiv$ (typep x \apostrophe
  (character :standard)}
\editend
\\
\edithead {\csdag 72}
\editstart
\\ \bf replace &
\cltxt
  See also {\clkwd standard-char-p, string-char-p, streamp,}
\\ \bf with &
\cltxt
  See also {\clkwd standard-char-p, streamp,}
\editend

\setcounter{subsubsection}{2}
\subsubsection{Equality Predicates} % 6.2.3.

\edithead {\csdag 75}
\editstart
\\ \bf replace &
\cltxt
  which ignores alphabetic case and certain other attributes
  of characters;
\\ \bf with &
\cltxt
  which ignores alphabetic case
  of characters;
\editend

%----------------------------------------------------------------------
\setcounter{section}{6}
\section{Control Structure}                 % 7
%----------------------------------------------------------------------

\setcounter{subsection}{1}
\subsection{Generalized Variables} % 7.2.

\edithead {\csdag 19 modify table}
\editstart
\\ \bf replace &
\cltxt
  char               string-char
\\ &
  schar              string-char
\\ \bf with &
\cltxt
  char               character
\\ &
  schar              character
\editend
\\
\edithead {\csdag 22 table entry}
\editstart
\\ \bf delete &
\cltxt
  char-bit           first                  set-char-bit
\editend

%----------------------------------------------------------------------
\setcounter{section}{9}
\section{Symbols}                           % 10
%----------------------------------------------------------------------

\edithead {\csdag 3}
\editstart
\\ \bf replace &
\cltxt
  It is ordinarily not permitted to alter a symbol's print name.
\\ \bf with &
\cltxt
  It is an error to alter a symbol's print name.
\editend

\setcounter{subsection}{1}
\subsection{The Print Name} % 10.2.

\edithead {\csdag 5}
\editstart
\\ \bf replace &
\cltxt
  It is an extremely bad idea
\\ \bf with &
\cltxt
  It is an error and an extremely bad idea
\editend

%----------------------------------------------------------------------
\setcounter{section}{12}
\section{Characters}                        % 13
%----------------------------------------------------------------------

\edithead {\csdag 6 after}
\editstart
\\ \bf insert &
\cltxt
  {\clkwd char-code-limit}   [{\clkwd Constant}]
\\ &
  The value of {\clkwd char-code-limit} is a non-negative integer
  that is the upper exclusive bound on values produced by the
  function {\clkwd char-code}, which returns the {\em code}
  of a given character; that is, the values returned by
  {\clkwd char-code} are non-negative and strickly less than
  the value of {\clkwd char-code-limit}.
  There may be unassigned codes between 0 and
  {\clkwd char-code-limit} which
  are not legal arguments to {\clkwd code-char}.
\\ & \bf Compatibility note:  -------------
\\ &
  Earlier versions of Common LISP incorporated {\em font} and
  {\em bits} as attributes of character objects.  These are considered
  implementation-defined
  attributes of character objects and if supported by an implementation
  effect the action of selected functions:
\begin{itemize}
\item Attributes, such as those
  dealing with how the character is displayed or its typography,
  are not part of the character code.
  For example, bold-face, color
  or size are not considered part of the character code.
\item If two characters differ in any implementation-defined attributes,
  then they are not {\clkwd char=}.
\item If two characters have identical implementation-defined
  attributes, then their ordering by
  {\clkwd char}$<$ is consistent with the numerical ordering by the
  predicate $<$ on
  their code attributes. (Similarly for {\clkwd char}$>$,
  {\clkwd char}$>=$ and {\clkwd char}$<=$.)
\item {\clkwd char-equal} ignores implementation-defined attributes.
\item The effect of {\clkwd char-upcase} and {\clkwd char-downcase}
  is to preserve implemenation-defined attributes.
\item The function {\clkwd char-int} is equivalent to {\clkwd char-code}
  if no implementation-defined attributes are associated with
  the character object.
\item The function {\clkwd int-char} is equivalent to {\clkwd code-char}
  if no implementation-defined attributes are associated with
  the character object.
\item It is implementation dependent whether characters within
  double quotes have implementation-defined attributes removed.
\item  In symbol construction, implementation-defined attributes such as
  color are removed.
\end{itemize}
\\ & \bf --------------------------------------------
\editend

\setcounter{subsection}{0}
\subsection{Character Attributes} % 13.1.

\edithead {\csdag delete entire section}
\editstart
\editend

\setcounter{subsection}{1}
\subsection{Predicates on Characters} % 13.2.


\edithead {\csdag 3}
\editstart
\\ \bf replace &
\cltxt
  argument is a "standard character" that is, an object of type
  {\clkwd standard-char}.
   Note that any character with a non-zero {\em bits} or {\em font}
   attribute
   is non-standard.
\\ \bf with &
\cltxt
  argument is one of the Common LISP standard character subrepertoire.
\editend
\\
\edithead {\csdag 4}
\editstart
\\ \bf delete &
\cltxt
  Note that any character with non-zero ...
\editend
\\
\edithead {\csdag 6}
\editstart
\\ \bf replace &
\cltxt
  Of the standard characters all but \#$\backslash${\clkwd Newline}
  are graphic.
  The semi-standard characters \#$\backslash${\clkwd Backspace},
  \#$\backslash${\clkwd Tab},
  \#$\backslash${\clkwd Rubout},
  \#$\backslash${\clkwd Linefeed},
  \#$\backslash${\clkwd Return},
  and \#$\backslash${\clkwd Page} are not graphic.
\\ \bf with &
\cltxt
  Of the standard characters all but \#$\backslash${\clkwd Newline}
  are graphic.
\editend
\\
\edithead {\csdag 7}
\editstart
\\ \bf delete &
\cltxt
  Programs may assume that graphic ...
\editend
\\
\edithead {\csdag 8}
\editstart
\\ \bf delete &
\cltxt
  Any character with a non-zero bits...
\editend
\\
\edithead {\csdag 9}
\editstart
\\ \bf delete &
\cltxt
  {\clkwd string-char-p} ...
\editend
\\
\edithead {\csdag 10}
\editstart
\\ \bf delete &
\cltxt
  The argument {\em char} must be ...
\editend
\\
\edithead {\csdag 13}
\editstart
\\ \bf replace &
\cltxt
  If a character is alphabetic, then it is perforce graphic.  Therefore
  any character
  with a non-zero bits attribute cannot be alphabetic.  Whether a
  character is
  alphabetic is may depend on its font number.
\\ \bf with &
\cltxt
  If a character is alphabetic, then it is perforce graphic.
\editend
\\
\edithead {\csdag 22}
\editstart
\\ \bf replace &
\cltxt
  If a character is either uppercase or lowercase, it is necessarily
  alphabetic (and
  therefore is graphic, and therefore has a zero bits attribute).
  However, it is permissible in theory for an alphabetic character
  to be neither
  uppercase nor lowercase (in a non-Roman font, for example).
\\ \bf with &
\cltxt
  If a character is either uppercase or lowercase, it is necessarily
  alphabetic (and
  therefore is graphic).
\editend
\\
\edithead {\csdag 25}
\editstart
\\ \bf replace &
\cltxt
  The argument {\em char} must be a character object, and {\em radix}
  must be a non-negative
  integer. If {\em char} is not a digit of the radix specified
\\ \bf with &
\cltxt
  The argument {\em char} must be in the standard character
  subrepertoire and
  {\em radix} must be a non-negative integer.
  If {\em char} is not a standard character or is not a digit of the
  radix specified
\editend
\\
\edithead {\csdag 51}
\editstart
\\ \bf delete &
\cltxt
  If two characters have the same bits ...
\editend
\\
\edithead {\csdag 52}
\editstart
\\ \bf replace &
\cltxt
  If two characters differ in any attribute (code, bits, or font), then
  they are different.
\\ \bf with &
\cltxt
  If the codes of two characters differ, then
  they are different.
\\ & \bf Compatibility note:  -------------
\\ &
  If two characters differ in any implementation-defined attributes,
  then they are different.
\\ & \bf --------------------------------------------
\editend
\\
\edithead {\csdag 94}
\editstart
\\ \bf replace &
\cltxt
  The predicate {\clkwd char-equal} is like {\clkwd char=}, and
  similarly for the others, except
  according to a different ordering such that differences of bits
  attributes and case are ignored, and font information is taken into
  account in an implementation dependent manner.
\\ \bf with &
\cltxt
  The predicate {\clkwd char-equal} is like {\clkwd char=}, and
  similarly for the others, except
  according to a different ordering such that differences of case and
  implementation-defined attributes are ignored.
\editend
\\
\edithead {\csdag 97 example}
\editstart
\\ \bf delete &
\cltxt
  {\clkwd (char-equal \#$\backslash$A \#$\backslash$Control-A) is true}
\editend
\\
\edithead {\csdag 98}
\editstart
\\ \bf delete &
\cltxt
  The ordering may depend on the font ...
\editend

\setcounter{subsection}{2}
\subsection{Character Construction and Selection} % 13.3.

\edithead {\csdag 3}
\editstart
\\ \bf replace &
\cltxt
  The argument {\em char} must be a character object.
  {\clkwd char-code} returns the {\em code} attribute of the
  character object;
  this will be a non-negative integer less than the (normal) value
\\ \bf with &
\cltxt
  The argument {\em char} must be a character object.
  {\clkwd char-code} returns the {\em code} of the
  character object;
  this will be a non-negative integer less than the value
\editend
\\
\edithead {\csdag 4}
\editstart
\\ \bf delete &
\cltxt
  {\clkwd char-bits } ...
\editend
\\
\edithead {\csdag 5}
\editstart
\\ \bf delete &
\cltxt
  The argument {\em char} must be ...
\editend
\\
\edithead {\csdag 6}
\editstart
\\ \bf delete &
\cltxt
  {\clkwd char-font } ...
\editend
\\
\edithead {\csdag 7}
\editstart
\\ \bf delete &
\cltxt
  The argument {\em char} must be ...
\editend
\\
\edithead {\csdag 8}
\editstart
\\ \bf replace &
\cltxt
  {\clkwd code-char {\em code} \&optional {\em (bits 0) (font 0)}
  [{\em Function}]}
\\ \bf with &
\cltxt
  {\clkwd code-char {\em code}
  [{\em Function}]}
\editend
\\
\edithead {\csdag 9}
\editstart
\\ \bf replace &
\cltxt
  All three arguments must be non-negative integers.  If it is possible
  in the
  implementation to construct a character object whose code attribute
  is {\em code},
  whose
  bits attribute is {\em bits}, and whose font attribute is {\em font},
  then such an object
  is returned;
\\ \bf with &
\cltxt
  The argument must be a non-negative integer.  If it is possible
  in the
  implementation to construct a character object identified by
  {\em code},
  then such an object is returned;
\editend
\\
\edithead {\csdag 10}
\editstart
\\ \bf replace &
\cltxt
  For any integers, {\em c, b,} and {\em f}, if {\clkwd (code-char
  {\em c b f})} is
\\ \bf with &
\cltxt
  For any integer, {\em c}, if {\clkwd (code-char
  {\em c})} is
\editend
\\
\edithead {\csdag 12}
\editstart
\\ \bf delete &
\cltxt
  {\clkwd (char-bits (code-char } ...
\editend
\\
\edithead {\csdag 13}
\editstart
\\ \bf delete &
\cltxt
  {\clkwd (char-font (code-char } ...
\editend
\\
\edithead {\csdag 14}
\editstart
\\ \bf delete &
\cltxt
  If the font and bits attributes ...
\editend
\\
\edithead {\csdag 15}
\editstart
\\ \bf delete &
\cltxt
  {\clkwd (char= (code-char (char-code ...}
\editend
\\
\edithead {\csdag 16}
\editstart
\\ \bf delete &
\cltxt
  is true.
\editend
\\
\edithead {\csdag 17}
\editstart
\\ \bf delete &
\cltxt
  {\clkwd make-char} ...
\editend
\\
\edithead {\csdag 18}
\editstart
\\ \bf delete &
\cltxt
 The argument {\em char} must be ...
\editend
\\
\edithead {\csdag 19}
\editstart
\\ \bf delete &
\cltxt
 If {\em bits} or {\em font} are zero ...
\editend

\setcounter{subsection}{3}
\subsection{Character Conversions} % 13.4.

\edithead {\csdag 8}
\editstart
\\ \bf replace &
\cltxt
  {\clkwd char-upcase} returns a character object with the same
  font and bits attributes as {\em char}, but with possibly a
  different code attribute.
\\ \bf with &
\cltxt
  {\clkwd char-upcase} returns a character object with possibly
  a different code.
\editend
\\
\edithead {\csdag 10}
\editstart
\\ \bf replace &
\cltxt
  Similarly, {\clkwd char-downcase} returns a character object with the
  same font and bits attributes as {\em char}, but with possibly a
  different code attribute.
\\ \bf with &
\cltxt
  Similarly, {\clkwd char-downcase} returns a character object with
  possibly a different code.
\editend
\\
\edithead {\csdag 12}
\editstart
\\ \bf delete &
\cltxt
  Note that the action of ...
\editend
\\
\edithead {\csdag 13}
\editstart
\\ \bf replace &
\cltxt
  {\clkwd digit-char {\em weight} \&optional ({\em radix} 10)
  ({\em font} 0)      [{\em Function}]}
\\ \bf with &
\cltxt
  {\clkwd digit-char {\em weight} \&optional ({\em radix} 10)
       [{\em Function}]}
\editend
\\
\edithead {\csdag 14}
\editstart
\\ \bf replace &
\cltxt
  All arguments must be integers.  {\clkwd digit-char} determines
  whether or not it is
  possible
  to construct a character object whose font attribute is {\em font},
  and whose {\em code}
\\ \bf with &
\cltxt
  All arguments must be integers.  {\clkwd digit-char} determines
  whether or not it is
  possible to construct a character object whose {\em code}
\editend
\\
\edithead {\csdag 15}
\editstart
\\ \bf replace &
\cltxt
  {\clkwd digit-char} cannot return {\clkwd nil} if {\em font}
  is zero, {\em radix}
\\ \bf with &
\cltxt
  {\clkwd digit-char} cannot return {\clkwd nil}.
  {\em radix}
\editend
\\
\edithead {\csdag 22}
\editstart
\\ \bf delete &
\cltxt
  Note that no argument is provided for ...
\editend
\\
\edithead {\csdag 22 after}
\editstart
\\ \bf insert &
\\ & \bf Compatibility note:  -------------
\\ &
  The {\clkwd char-int} and {\clkwd int-char} functions are retained
  for compatibility with earlier verions of Common LISP which support
  implementation-defined attributes.
\editend
\\
\edithead {\csdag 24}
\editstart
\\ \bf replace &
\cltxt
  The argument {\em char} must be a character object. {\clkwd char-int}
  returns a non-negative integer encoding the character object.
\\ \bf with &
\cltxt
  The argument {\em char} must be a character object. {\clkwd char-int}
  returns a non-negative integer encoding the character object
  including any implementation-defined attributes.
\editend
\\
\edithead {\csdag 25}
\editstart
\\ \bf replace &
\cltxt
  If the font and bits attributes of {\em char} are zero, then
\\ \bf with &
\cltxt
  If the implementation-defined attributes of {\em char} are zero, then
\editend
\\
\edithead {\csdag 30 after}
\editstart
\\ \bf insert &
\\ & \bf --------------------------------------------
\editend
\\
\edithead {\csdag 32}
\editstart
\\ \bf replace &
\cltxt
  All characters that have zero font and bits attributes and that are
  non-graphic
\\ \bf with &
\cltxt
  All characters that are
  non-graphic
\editend
\\
\edithead {\csdag 33}
\editstart
\\ \bf replace &
\cltxt
  The standard newline and space characters have the respective
  names {\clkwd Newline} and {\clkwd Space}.  The semi-standard
  characters have the names {\clkwd Tab, Page, Rubout, Linefeed,
  Return,} and {\clkwd Backspace}.
\\ \bf with &
\cltxt
  The standard newline and space characters have the respective
  names {\clkwd Newline} and {\clkwd Space}.
\editend
\\
\edithead {\csdag 35}
\editstart
\\ \bf delete &
\cltxt
  {\clkwd char-name} will only locate "simple" ...
\editend

\setcounter{subsection}{4}
\subsection{Character Control-Bit Functions} % 13.5.

\edithead {\csdag delete entire section}
\editstart
\editend

%----------------------------------------------------------------------
\setcounter{section}{13}
\section{Sequences}                         % 14
%----------------------------------------------------------------------
\setcounter{subsection}{0}
\subsection{Simple Sequence Functions}         % 14.1

\edithead {\csdag 24}
\editstart
\\ \bf append &
\cltxt
  If type {\clkwd string} is specified, a string of type
  {\clkwd most-general-string} is returned.
\editend

\setcounter{subsection}{1}
\subsection{Concatenating, Mapping, and Reducing Sequences}  % 14.2.

\edithead {\csdag 3}
\editstart
\\ \bf append &
\cltxt
  If {\em result-type} {\clkwd string} is specified, any string
  subtype which can hold the elements of the sequence can be returned.
\editend
\\
\edithead {\csdag 6}
\editstart
\\ \bf append &
\cltxt
  If {\em result-type} {\clkwd string} is specified, any string
  subtype which can hold the elements of the sequence can be returned.
\editend

\setcounter{subsection}{2}
\subsection{Modifying Sequences}  % 14.3.

\edithead {\csdag 29}
\editstart
\\ \bf append &
\cltxt
  If {\em newitem} is of type {\clkwd string}, any string subtype
  which can hold the elements of the result sequence can be returned.
\editend
\\
\edithead {\csdag 36}
\editstart
\\ \bf append &
\cltxt
  If {\em newitem} is of type {\clkwd string}, any string subtype
  which can hold the elements of the result sequence can be returned.
\editend

\setcounter{subsection}{4}
\subsection{Sorting and Merging}  % 14.5.

\edithead {\csdag 20}
\editstart
\\ \bf append &
\cltxt
  If {\em result-type} {\clkwd string} is specified, any string subtype
  which can hold the elements of the result sequence can be returned.
\editend

%----------------------------------------------------------------------
\setcounter{section}{17}
\section{Strings}                           % 18
%----------------------------------------------------------------------

\edithead {\csdag 1}
\editstart
\\ \bf replace &
\cltxt
  Specifically, the type {\clkwd string} is identical to the type
  {\clkwd (vector string-char),}
  which in turn is the same as {\clkwd (array string-char (*))}.
\\ \bf with &
\cltxt
  Specifically, the type {\clkwd string} is a subtype of
  {\clkwd vector}
  and consists of vectors specialized by subtypes of {\clkwd character}.
\editend

\setcounter{subsection}{0}
\subsection{String Access}  % 18.1.

\edithead {\csdag 3}
\editstart
\\ \bf replace &
\cltxt
  {\clkwd schar} {\em simple-string index}             [{\em Function}]
\\ \bf with &
\cltxt
  {\clkwd schar} {\em simple-base-string index}        [{\em Function}]
\editend
\\
\edithead {\csdag 4}
\editstart
\\ \bf replace &
\cltxt
  character object.  (This character will necessarily satisfy the
  predicate
  {\clkwd string-char-p}).
\\ \bf with &
\cltxt
  character object.
\editend
\\
\edithead {\csdag 9}
\editstart
\\ \bf replace &
\cltxt
  {\clkwd setf} may be used with {\clkwd char} to destructively
  replace a character within a string.
\\ \bf with &
\cltxt
  {\clkwd setf} may be used with {\clkwd char} to destructively
  replace a character within a string.
  The new character must be of a type which can be stored in the
  string; it is an error otherwise.
\editend
\\
\edithead {\csdag 10}
\editstart
\\ \bf replace &
\cltxt
  it must be a simple string.
\\ \bf with &
\cltxt
  it must be a simple base string.
\editend

\setcounter{subsection}{2}
\subsection{String Construction and Manipulation}  % 18.3.

\edithead {\csdag 2}
\editstart
\\ \bf replace &
\cltxt
  {\clkwd make-string {\em size} \&key :initial-element  [{\em Function}]}
\\ \bf with &
\cltxt
  {\clkwd make-string {\em size} \&key :initial-element  :element-type
  [{\em Function}]}
\editend
\\
\edithead {\csdag 3}
\editstart
\\ \bf replace &
\cltxt
  This returns a string (in fact a simple string) of length {\em size},
  each of whose characters has been initialized to the
  {\clkwd :initial-element} argument.  If an {\clkwd :initial-element}
  argument is not specified, then the string will be initialized
  in an implementation-dependent way.
\\ \bf with &
\cltxt
  This returns a string of length {\em size},
  each of whose characters has been initialized to the
  {\clkwd :initial-element} argument.  If an {\clkwd :initial-element}
  argument is not specified, then the string will be initialized
  in an implementation-dependent way.
  The {\clkwd :element-type} argument names the type of the elements
  of the string; a string is constructed of the most specialized
  type that can accomodate elements of the given type.
\editend
\\
\edithead {\csdag 5}
\editstart
\\ \bf replace &
\cltxt
  A string is really just a one-dimensional array of "string
  characters" (that is,
  those characters that are members of type {\clkwd string-char}).
  More complex character arrays may be constructed using the function
  {\clkwd make-array}.
\\ \bf with &
\cltxt
  More complex character arrays may be constructed using the function
  {\clkwd make-array}.
\editend
\\
\edithead {\csdag 29}
\editstart
\\ \bf replace &
\cltxt
  If {\em x} is a string character (a character of type
  {\clkwd string-char}), then
\\ \bf with &
\cltxt
  If {\em x} is a character, then
\editend

%----------------------------------------------------------------------
\setcounter{section}{21}
\section{Input/Output}                      % 22

\setcounter{subsection}{0}
\subsection{Printed Representation of LISP Objects}  % 22.1.

\setcounter{subsubsection}{0}
\subsubsection{What the Read Function Accepts}  % 22.1.1.

\edithead {\csdag Table 22-1: Standard Character Syntax Types}
\editstart
\\ \bf delete entry &
\cltxt
  {\clkwd <tab>} {\em whitespace}
\\ &
  {\clkwd <page>} {\em whitespace}
\\ &
  {\clkwd <backspace>} {\em constituent}
\\ &
  {\clkwd <return>} {\em whitespace}
\\ &
  {\clkwd <rubout>} {\em constituent}
\\ &
  {\clkwd <linefeed>} {\em whitespace}
\editend

\setcounter{subsubsection}{1}
\subsubsection{Parsing of Numbers and Symbols}  % 22.1.2.

\edithead {\csdag Table 22-3: Standard Constituent Character
Attributes}
\editstart
\\ \bf delete entry &
\cltxt
  {\clkwd <backspace>} {\em illegal}
\\  &
  {\clkwd <tab>} {\em illegal}
\\  &
  {\clkwd <linefeed>} {\em illegal}
\\  &
  {\clkwd <page>} {\em illegal}
\\  &
  {\clkwd <return>} {\em illegal}
\\  &
  {\clkwd <rubout>} {\em illegal}
\editend

\setcounter{subsubsection}{3}
\subsubsection{Standard Dispatching Macro Character Syntax}  % 22.1.4.

\edithead {\csdag Table 22-4: Standard \# Macro Character Syntax}
\editstart
\\ \bf delete entry &
\cltxt
  {\clkwd \#<backspace>} {\em signals error}
\\  &
  {\clkwd \#<tab>} {\em signals error}
\\  &
  {\clkwd \#<linefeed>} {\em signals error}
\\  &
  {\clkwd \#<page>} {\em signals error}
\\  &
  {\clkwd \#<return>} {\em signals error}
\\  &
  {\clkwd \#<rubout>} {\em undefined}
\editend
\\
\edithead {\csdag 8}
\editstart
\\ \bf replace &
\cltxt
  The following names are standard across all implementations:
\\ \bf with &
\cltxt
  All characters, including extended characters, are uniquely
  named in an implementation-dependent manner.
  The following names are standard across all implementations:
\editend
\\
\edithead {\csdag 11 through 18 inclusive delete}
\editstart
\\ \bf delete &
\cltxt
  The following names are semi-standard; ...
\editend
\\
\edithead {\csdag 20 through 26 inclusive delete}
\editstart
\\ \bf delete &
\cltxt
  The following convention is used in implementations ...
\editend
\\
\edithead {\csdag 108}
\editstart
\\ \bf replace &
\cltxt
  {\clkwd \#<space>, \#<tab>, \#<newline>, \#<page>, \#<return>}
\\ \bf with &
\cltxt
  {\clkwd \#<space>, \#<newline>}
\editend

\setcounter{subsubsection}{4}
\subsubsection{The Readtable}  % 22.1.5.

\edithead {\csdag 3}
\editstart
\\ \bf replace &
\cltxt
  Even if an implementation supports characters with non-zero
  {\em bits} and {\em font}
  attributes, it need not (but may) allow for such characters to
  have syntax
  descriptions
  in the readtable.  However, every character of type
  {\clkwd string-char}
  must be represented in the readtable.
\\ \bf with &
\cltxt
  Even if an implementation supports extended characters, it
  need not
  (but may) allow for such characters to
  have syntax descriptions
  in the readtable.  However, every character of type
  {\clkwd base-character} must be
  represented in the readtable.
\editend

\setcounter{subsubsection}{5}
\subsubsection{What the Print Function Produces}  % 22.1.6.

\edithead {\csdag 13}
\editstart
\\ \bf replace &
\cltxt
  is used.  For example, the printed representation of the character
  \#$\backslash$A
  with control
  and meta bits on would be \#$\backslash${\clkwd CONTROL-META-A},
  and that of
  \#$\backslash$a with control and meta bits on would be
  \#$\backslash${\clkwd CONTROL-META-$\backslash$a}.
\\ \bf with &
\cltxt
  is used (see 22.1.4).
\editend

\setcounter{subsection}{2}
\subsection{Output Functions}  % 22.3.

\setcounter{subsubsection}{0}
\subsubsection{Output to Character Streams}  % 22.3.1.

\edithead {\csdag 26}
\editstart
\\ \bf replace &
\cltxt
  ({\em not} the substring delimited by {\clkwd :start} and
  {\clkwd :end}).
\\ \bf with &
  ({\em not} the substring delimited by {\clkwd :start} and
  {\clkwd :end}).
  Only characters which are members of the character set(s)
  associated with the output stream are valid to be written;
  it is an error otherwise.
\editend
\\
\edithead {\csdag 27 after}
\editstart
\\ \bf insert &
\cltxt
  {\clkwd external-width} {\em object} \&{\clkwd optional}
  {\em output-stream}   [{\em Function}]
\\  &
  {\clkwd external-width} returns the number of host system base
  character units required for the object on the output-stream. If
  not applicable to the output stream, the function
  to the output
  should return {\clkwd nil}.
\editend
\\
\edithead {\csdag append to section}
\editstart
\\ \bf insert &
\cltxt
  {\clkwd *format-external-width} [{\em Variable}]
\\  &
{\clkwd *format-external-width*} specifies how numeric parameters
in a format control string are interpreted.
This allows the directive parameters
to be used in producing columnar output, as long as the width
in bytes of the external code format corresponds to the
resulting width of the displayed or printed output.
\\ &
If {\clkwd *format-external-width*} is {\clkwd T} then
{\clkwd format} uses the destination stream type to interpret
numeric parameters as external format units for this type of
stream;  if the destination stream type is {\clkwd NIL}, numeric
parameters are interpreted as characters.  This is the default.
If {\clkwd NIL}, {\clkwd format} interprets numeric parameters as
characters, regardless of the destination stream type.
If the value is a keyword that specifies an external code
format recognized by the implementation (see {\clkwd open})
{\clkwd format} interprets numeric parameters as external format
units when the destination stream is {\clkwd NIL}.  If the
destination stream type is not {\clkwd NIL}, this value has
no effect.
\editend

\setcounter{subsubsection}{2}
\subsubsection{Formatted Output to Character Streams}  % 22.3.3.

\edithead {\csdag 23 delete example}
\editstart
\\ \bf delete &
\cltxt
  {\clkwd (format nil "Type} $\tilde{ }$
  {\clkwd :C to $\tilde{ }$ :A."} . . .
\editend
\\
\edithead {\csdag 66}
\editstart
\\ \bf replace &
\cltxt
  $\tilde{ }${\clkwd :C} spells out the names of the control bits and
  represents non-printing
  characters by their names: {\clkwd Control-Meta-F, Control-Return,
  Space}.
  This is a "pretty" format for printing characters.
\\ \bf with &
\cltxt
  $\tilde{ }${\clkwd :C}
  represents non-printing
  characters by their names: {\clkwd Newline,
  Space}.  This is a "pretty" format
  for printing characters.
\editend
%----------------------------------------------------------------------

%----------------------------------------------------------------------
\setcounter{section}{22}
\section{File System Interface}             % 23

\setcounter{subsection}{1}
\subsection{Opening and Closing Files}  % 23.2.

\edithead {\csdag 2}
\editstart
\\ \bf replace &
\cltxt
  {\clkwd open {\em filename} \&key :direction :element-type}
  {\clkwd :if-exists :if-does-not-exist}
  [{\em Function}]
\\ \bf with &
\cltxt
  {\clkwd open {\em filename} \&key :direction :element-type}
  {\clkwd :character-set
  :external-code-format}
  {\clkwd :if-exists :if-does-not-exist}
  [{\em Function}]
\editend
\\
\edithead {\csdag 11}
\editstart
\\ \bf replace &
\cltxt
  {\clkwd string-char}
\\  &
  The unit of transaction is a string-character.  The functions
  {\clkwd read-char}
  and/or {\clkwd write-char} may be used on the stream.  This is
  the default.
\\ \bf with &
\cltxt
  {\clkwd base-character}
\\  &
  The unit of transaction is a base character.  The functions
  {\clkwd read-char}
  and/or {\clkwd write-char} may be used on the stream.  This is
  the default.
\editend
\\
\edithead {\csdag 16}
\editstart
\\ \bf replace &
\cltxt
  {\clkwd character}
\\  &
  The unit of transaction is any character, not just a string-character.
  The functions
\\ \bf with &
\cltxt
  {\clkwd character}
\\  &
  The unit of transaction is any character.
  The functions
\editend
\\
\edithead {\csdag 19 after}
\editstart
\\ \bf insert &
\cltxt
  {\clkwd :external-code-format}
\\  &
This argument specifies
keyword(s) indicating an implementation recognized scheme for
representing 1 or more character sets with non-homogeneous codes.
\\  &
The default is the natural system character representation,
the base character representation.
\\  &
For example, the SO/SI SBCS/DBCS convention used by IBM on 370
machines could be selected by a keyword
{\clkwd :shift-delimited}.
The compact run-encoding convention defined by XEROX could be
selected by {\clkwd :run-encoded}.
The SBCS/DBCS convention based on
ASCII which uses leading bit patterns to distinguish two-byte codes
from one-byte codes could be selected by a keyword like
{\clkwd :high-byte-delimited}.
\\  &
As many {\clkwd :character-set} names must be provided as the
implementation requires for that external coding convention.
For example, if {\clkwd :shift-delimited} were the
{\clkwd :external-code-format} argument, two character set specifiers
would have to be provided.
\\  &
\editend
\\
\edithead {\csdag 19 after}
\editstart
\\ \bf insert &
\cltxt
  {\clkwd :character-set}
\\  &
This argument specifies a implementation-defined
name or list of names of
defined character sets in the form of keywords.
The default is the base character set when
{\clkwd :external-code-format} is also defaulted.  If a non-default
value is specified for {\clkwd :external-code-format}, there may be a
different default for {\clkwd :character-set}.
\editend
%----------------------------------------------------------------------

%----------------------------------------------------------------------
\begin{thebibliography}{wwwwwwww 99}


\bibitem[Ida87]{ida87} M. Ida, et al.,
{\em
JEIDA Common LISP Committee Proposal on Embedding Multi-Byte Characters
},
ANSI X3J13 document 87-022, (1987).

\bibitem[Linden87]{linden87} T. Linden,
{\em
Common LISP - Proposed Extensions for International Character Set
Handling
},
Version 01.11.87, IBM Corporation (1987).

\bibitem[Kerns87]{kerns87} R. Kerns,
{\em
Extended Characters in Common LISP
},
X3J13 Character Subcommittee document, Symbolics Inc (1987).

\bibitem[Steele84]{steele84} G. Steele Jr.,
{\em
Common LISP: the Language
},
Digital Press (1984).

\end{thebibliography}

\end{document}             % End of document.

∂26-Sep-88  1032	CL-Characters-mailer 	relay of Ito message   
Received: from IBM.COM by SAIL.Stanford.EDU with TCP; 26 Sep 88  10:31:40 PDT
Date: Mon, 26 Sep 88 09:05:16 PDT
From: Thom Linden <baggins@ibm.com>
To: "X3J13: Character Subcommittee" <cl-characters@sail.stanford.edu>
Message-ID: <880926.090516.baggins@IBM.com>
Subject: relay of Ito message


======================================================================


Date: Mon, 26 Sep 88 14:31:59 jst
From: Takayasu ITO <ito%ito.ito.ecei.tohoku.junet%utokyo-relay.csnet@RELAY.CS.NET>
Return-Path: <ito@ito.ito.ecei.tohoku.junet>
Message-Id: <8809260531.AA00524@ito.ito.ecei.tohoku.junet>
To: baggins%ibm.com%relay.cs.net%u-tokyo.junet%utokyo-relay.csnet@RELAY.CS.NET

Status: R
Dear Dr. Linden,
I received your express airmail which contains DRAFT on Int'l Character Sets
for X3J13 October meeting.
I read DRAFT DRAFT which was given to me from Mr. Kurokawa of IBM Japan.
If DRAFT is essentially same with DRAFT DRAFT,we have many comments on it.
Since we are going to have our 3rd Special Meeting on Character Sets on
October 3rd we will let you know about our opinions on your proposal and
on our proposal to ISO WG16, before X3J13 October meeting.
On October 14 and 15 we are going to have a small meeting to prepare our
documents to ISO WG16. When we are ready to distribute it we will sent its
copy to you.
Thanking your physical mail.
Sincerely,
Takayasu Ito

∂28-Sep-88  1236	CL-Characters-mailer 	comments on character proposal   
Received: from cs.utah.edu by SAIL.Stanford.EDU with TCP; 28 Sep 88  12:36:26 PDT
Received: by cs.utah.edu (5.54/utah-2.0-cs)
	id AA19059; Wed, 28 Sep 88 13:34:55 MDT
Received: by defun.utah.edu (5.54/utah-2.0-leaf)
	id AA06808; Wed, 28 Sep 88 13:34:53 MDT
From: sandra%defun@cs.utah.edu (Sandra J Loosemore)
Message-Id: <8809281934.AA06808@defun.utah.edu>
Date: Wed, 28 Sep 88 13:34:51 MDT
Subject: comments on character proposal
To: cl-characters@sail.stanford.edu

The only thing I really found confusing about this proposal was the
elimination of the semi-standard characters and section 2.2.3 on
non-standard characters.  With that gone, it is not left entirely
clear that an implementation may support other named characters
besides #\space and #\newline, until the reader gets to chapter 14 and
the description of CHAR-NAME, where we are told that all non-graphic
characters have names.  I really think that chapter 2 ought to include
something to the effect that an implementation can support named
characters that are not in the standard character set.

On the whole, the proposal looks pretty good.

-Sandra
-------

∂29-Sep-88  1506	CL-Characters-mailer 	char-name    
Received: from IBM.COM by SAIL.Stanford.EDU with TCP; 29 Sep 88  15:06:35 PDT
Date: Thu, 29 Sep 88 12:54:20 PDT
From: Thom Linden <baggins@ibm.com>
To: "Sandra J Loosemore" <sandra%defun@cs.utah.edu>
cc: "X3J13: Character Subcommittee" <cl-characters@sail.stanford.edu>
Message-ID: <880929.125420.baggins@IBM.com>
Subject: char-name


>The only thing I really found confusing about this proposal was the
>elimination of the semi-standard characters and section 2.2.3 on
>non-standard characters.  With that gone, it is not left entirely
>clear that an implementation may support other named characters
>besides #\space and #\newline, until the reader gets to chapter 14 and
>the description of CHAR-NAME, where we are told that all non-graphic
>characters have names.  I really think that chapter 2 ought to include
>something to the effect that an implementation can support named
>characters that are not in the standard character set.


Thanks for the comment.  I agree this would be a good addition.

Regards,
  Thom

∂29-Sep-88  1507	CL-Characters-mailer 	character proposal comments 
Received: from IBM.COM by SAIL.Stanford.EDU with TCP; 29 Sep 88  15:06:51 PDT
Date: Thu, 29 Sep 88 14:06:02 PDT
From: Thom Linden <baggins@ibm.com>
To: Dave Unietis <dru@lucid.com>
cc: "X3J13: Character Subcommittee" <cl-characters@sail.stanford.edu>
Message-ID: <880929.140602.baggins@IBM.com>
Subject: character proposal comments

David,
  Thanks for your review and comments.

  Our upcomming Monday discussion (and I imagine in the J13 mtg) we
will definitely cover the simple-string and equivalency topics. We
don't want different semantics between ISO and ANSI for
simple-strings so this item must be resolved.  As for
equivalency, I favor inclusion of static equivalence classes
but consider this orthogonal to the rest of the proposal and
therefore might be omitted from ANSI if we can't come to agreement.
(ps. I believe the ISO committee on character sets is working toward
some universal repetoire/encoding but I don't have any firm info)

  Your contribution of experience with an implementation will be
quite helpful in resolving these items.

  I'm afraid I don't have any comments/documents from JEIDA, I imagine
they are being drafted, as we (net)speak, for the ISO meeting in Nov.
I believe one of the important points may be inclusion of static
equivalency class.

Regards,
  Thom
=========================================================================
Received: from  lucid.com by IBM.COM on 09/29/88 at 12:56:27 PDT
Received: from jack-jr ([192.9.200.25]) by heavens-gate.lucid.com id AA01768g; Thu, 29 Sep 88 11:53:34 PST
Received: by jack-jr id AA02812g; Thu, 29 Sep 88 12:52:13 PDT
Date: Thu, 29 Sep 88 12:52:13 PDT
From: Dave Unietis <dru@lucid.com>
Message-Id: <8809291952.AA02812@jack-jr>
To: baggins@ibm.com
In-Reply-To: Thom Linden's message of Fri, 16 Sep 88 17:03:24 PDT <880916.170324.baggins@IBM.com>
Subject: cs proposal

I received the latest draft of the character set proposal, and it seems
to adequately cover most of the issues raised by my earlier comments.  The
issue I brought up concerning the type definition of most-general-string
was entirely my fault - I misread the type definition of string in the latest
draft.  Defining the string type as a disjunction of other types solves
the problem satisfactorily.

I have a few remaining comments on the issues below:


* Simple-strings and SCHAR

We have no direct user experience to report here, but rather are basing our
opinion on the original JEIDA proposal as well as discussions with IBM Japan
and CSK, all of whom strongly desire compatible string access.
Furthermore, we've done some measurements of our prototype Kanji
implementations that treat SCHAR in this manner, and they indicate that the
performance impact is fairly small.  Of course, this experience is only
relevant to general-purpose architectures, it may be more difficult and/or
expensive to re-implement SCHAR this way on microcoded Lisp machines - I
wonder how much influence this contingent has had on the discussion...


* Equivalence classes

To me, it seems unrealistic to expect ISO to standardize on a non-overlapping
character set, when all existing Kanji character sets (at least, all I know
about) contain a 'double-byte' version of either ASCII or EBCDIC embedded in
them.


* JEIDA

I'm concerned that their input may be arriving too late, especially if adopting
their recommendations would result in substantial revisions.  The message you
forwarded from Professor Ito suggests that they do have significant comments.
At very least, I feel we need to set aside part of the Monday meeting to a
review of their suggestions.  If it is possible for you to get the meeting
attendees a copy in advance, it would be helpful.


Overall, the proposal is looking quite good.


- David

∂30-Sep-88  0013	CL-Characters-mailer 	character proposal comments 
Received: from Riverside.SCRC.Symbolics.COM (SCRC-RIVERSIDE.ARPA) by SAIL.Stanford.EDU with TCP; 30 Sep 88  00:12:57 PDT
Received: from F.ILA.Dialnet.Symbolics.COM (FUJI.ILA.Dialnet.Symbolics.COM) by Riverside.SCRC.Symbolics.COM via DIAL with SMTP id 284704; 30 Sep 88 03:09:34 EDT
Received: from CALVARY.ILA.Dialnet.Symbolics.COM by F.ILA.Dialnet.Symbolics.COM via CHAOS with CHAOS-MAIL id 1089; Fri 30-Sep-88 02:08:02 EDT
Date: Fri, 30 Sep 88 02:08 EDT
From: RWK@FUJI.ILA.Dialnet.Symbolics.COM
Sender: MAS-B@FUJI.ILA.Dialnet.Symbolics.COM
Subject: character proposal comments
To: Dave Unietis <dru@lucid.com>
cc: "X3J13: Character Subcommittee" <cl-characters@sail.stanford.edu>
In-Reply-To: <880929.140602.baggins@IBM.com>
Message-ID: <19880930060800.4.MAS-B@CALVARY.ILA.Dialnet.Symbolics.COM>

    Date: Thu, 29 Sep 88 12:52:13 PDT
    From: Dave Unietis <dru@lucid.com>
    * Simple-strings and SCHAR
    
    We have no direct user experience to report here, but rather are basing our
    opinion on the original JEIDA proposal as well as discussions with IBM Japan
    and CSK, all of whom strongly desire compatible string access.
    Furthermore, we've done some measurements of our prototype Kanji
    implementations that treat SCHAR in this manner, and they indicate that the
    performance impact is fairly small. Of course, this experience is only
    relevant to general-purpose architectures, it may be more difficult and/or
    expensive to re-implement SCHAR this way on microcoded Lisp machines - I
    wonder how much influence this contingent has had on the discussion...

Simple answer: no influence at all. SCHAR is exactly the same as CHAR
which is exactly the same as AREF for all microcoded implementations
which I am aware of. (I don't know what Xerox does, but I think I have
all the others covered).  The whole purpose of SCHAR is to satisfy the
requirements of the so-called "general-purpose" architectures.  (Really
now, wouldn't it be more accurate to call these specialized for non-lisp?)

    * Equivalence classes
    
    To me, it seems unrealistic to expect ISO to standardize on a non-overlapping
    character set, when all existing Kanji character sets (at least, all I know
    about) contain a 'double-byte' version of either ASCII or EBCDIC embedded in
    them.

JIS does not have a second version of ASCII, but it does have second
versions of a great many ASCII symbols.

There is some question as to whether the embedded romaji characters
(i.e. "english letters") in JIS character set are the same characters
semantically as the ASCII characters, or are special symbols.  Let me
list some of the confusing aspects:

1) ISO provides for switching between JIS and other language character
   sets. Why do the JIS embedded romaji exist?
  a) So you can use the JIS characters without ISO?  This would imply
     they mean the same.
  b) So there are separate characters which can used for special purposes
     inside Kanji text.  (Foreign words would normally be rendered in Katakana).
  c) To indicate that the characters should be displayed in the same size
     square as the Kanji. If they're otherwise the same characters, this would
     argue for overlapping sets.
  I suspect it would be helpful to know the real answer to this one.
2) Existing practice is inconsistant.
  a) The Kanji Macintosh software I have seen is pretty pathetic; applications
     suffer from "two byte disease". I cannot yet comment coherently on this
     one because the other poor qualities of MacKanji.  (Carl Hoffman tells me
     better software is available).
  b) Japanese word processors I have seen vary in their handling of romaji. I
     have seen them treated as just a spacing variant, and I have seen them
     treated treated very differently on input.  I haven't yet found out how
     they treat them in searches, which I think is the definitive test.  Perhaps
     after I learn more Japanese...
  c) The Symbolics Japanese support provides for optional canonicalization on
     input.  I believe this is because of conflicting requests, although I'm not
     certain.
3) The Japanese community appears divided on the issue.  This may actually only
   be a communication problem, but I have been told both things: that they are
   separate characters not considered to have the same meaning, and they are
   distinct characters.  I'm not sure how to identify a definitive answer to this.
   I am sure much of my confusion comes from dealing with individual people and/or
   organizations.

That all said, I can tell you that my bias is to treat them as having
the same meaning, and just a different typeface.

∂30-Sep-88  1235	CL-Characters-mailer 	character proposal comments 
Received: from lucid.com by SAIL.Stanford.EDU with TCP; 30 Sep 88  12:35:40 PDT
Received: from jack-jr ([192.9.200.25]) by heavens-gate.lucid.com id AA00787g; Fri, 30 Sep 88 11:33:29 PST
Received: by jack-jr id AA05853g; Fri, 30 Sep 88 12:31:20 PDT
Date: Fri, 30 Sep 88 12:31:20 PDT
From: Dave Unietis <dru@lucid.com>
Message-Id: <8809301931.AA05853@jack-jr>
To: RWK@FUJI.ILA.Dialnet.Symbolics.COM
Cc: cl-characters@sail.stanford.edu
In-Reply-To: RWK@FUJI.ILA.Dialnet.Symbolics.COM's message of Fri, 30 Sep 88 02:08 EDT <19880930060800.4.MAS-B@CALVARY.ILA.Dialnet.Symbolics.COM>
Subject: character proposal comments


    Date: Fri, 30 Sep 88 02:08 EDT
    From: RWK@FUJI.ILA.Dialnet.Symbolics.COM
    * Equivalence classes

    There is some question as to whether the embedded romaji characters
    (i.e. "english letters") in JIS character set are the same characters
    semantically as the ASCII characters, or are special symbols.  Let me
    list some of the confusing aspects:

    1) ISO provides for switching between JIS and other language character
       sets. Why do the JIS embedded romaji exist?
      a) So you can use the JIS characters without ISO?  This would imply
         they mean the same.
      b) So there are separate characters which can used for special purposes
         inside Kanji text.  (Foreign words would normally be rendered in
         Katakana).
      c) To indicate that the characters should be displayed in the same size
         square as the Kanji. If they're otherwise the same characters, this
         would argue for overlapping sets.
      I suspect it would be helpful to know the real answer to this one.

I'm not sure a "real answer" exists, but I think one reason romaji came into
existence was to allow English letters and symbols to be input easily without
constantly shifting in and out of some special keyboard mode.  Also, the 
resulting combined Kanji/romaji data could be more easily formatted into
columns, tables and the like because the characters are all fixed-width.
Regardless of how important either one of these may be in the future, there 
appears to be a large amount of existing programs and data that depend on
the "double-square" display characteristic of romaji. 


    3) The Japanese community appears divided on the issue.  This may actually 
       only be a communication problem, but I have been told both things: that
       they are separate characters not considered to have the same meaning, 
       and they are distinct characters.  I'm not sure how to identify a
       definitive answer to this.  I am sure much of my confusion comes from
       dealing with individual people and/or organizations.

       That all said, I can tell you that my bias is to treat them as having
       the same meaning, and just a different typeface.

I also have received much conflicting information on this issue, but the 
consensus seems to be that when romaji characters are treated "syntactically"
(whatever that means), they should be considered equivalent to the
corresponding ASCII, but when treated as data, they should be processed 
transparently.  I'm sure much of the confusion stems from the fact that in 
Lisp this distinction is quite difficult to make.

   2) Existing practice is inconsistant.
      a) The Kanji Macintosh software I have seen is pretty pathetic; 
         applications suffer from "two byte disease". I cannot yet comment
         coherently on this one because the other poor qualities of MacKanji.
         (Carl Hoffman tells me better software is available).
      b) Japanese word processors I have seen vary in their handling of romaji.
         I have seen them treated as just a spacing variant, and I have seen
         them treated treated very differently on input.  I haven't yet found
         out how they treat them in searches, which I think is the definitive 
         test.  Perhaps after I learn more Japanese...
      c) The Symbolics Japanese support provides for optional canonicalization
         on input.  I believe this is because of conflicting requests, although
         I'm not certain.

Under Linden's equivalence class proposal, whenever a romaji character is read 
in "non-escape" mode, such as when reading a left parenthesis to start a 
list, or when reading a symbol, the character is first canonicalized to its
ASCII equivalent, and then processed.  Thus '( ' is converted to '(' and thus
"inherits" its syntax, and 'a', 'A', 'a ' and 'A ' are all converted to 'A'
in symbols.  When in escape mode, such as when reading strings, romaji
characters are left unchanged.  If the equivalence class is defined as a
static rather than rebindable property of the character set, problems such 
as uncertain symbol-EQness are avoided. 

I'm sure this won't make everyone happy, but it seems to come the closest of 
the proposals I've heard so far.  On the other hand, given the confusion 
surrounding this issue, perhaps waiting for JEIDA's recommendation is the
right thing to do, if it can be obtained prior to the Oct. meeting. 

Dave













∂30-Sep-88  1244	CL-Characters-mailer 	character proposal comments 
Received: from lucid.com by SAIL.Stanford.EDU with TCP; 30 Sep 88  12:44:22 PDT
Received: from jack-jr ([192.9.200.25]) by heavens-gate.lucid.com id AA00804g; Fri, 30 Sep 88 11:42:14 PST
Received: by jack-jr id AA05874g; Fri, 30 Sep 88 12:40:52 PDT
Date: Fri, 30 Sep 88 12:40:52 PDT
From: Dave Unietis <dru@lucid.com>
Message-Id: <8809301940.AA05874@jack-jr>
To: cl-characters@sail.stanford.edu
Subject: character proposal comments


    Date: Fri, 30 Sep 88 02:08 EDT
    From: RWK@FUJI.ILA.Dialnet.Symbolics.COM
    * Equivalence classes

    There is some question as to whether the embedded romaji characters
    (i.e. "english letters") in JIS character set are the same characters
    semantically as the ASCII characters, or are special symbols.  Let me
    list some of the confusing aspects:

    1) ISO provides for switching between JIS and other language character
       sets. Why do the JIS embedded romaji exist?
      a) So you can use the JIS characters without ISO?  This would imply
         they mean the same.
      b) So there are separate characters which can used for special purposes
         inside Kanji text.  (Foreign words would normally be rendered in
         Katakana).
      c) To indicate that the characters should be displayed in the same size
         square as the Kanji. If they're otherwise the same characters, this
         would argue for overlapping sets.
      I suspect it would be helpful to know the real answer to this one.

I'm not sure a "real answer" exists, but I think one reason romaji came into
existence was to allow English letters and symbols to be input easily without
constantly shifting in and out of some special keyboard mode.  Also, the 
resulting combined Kanji/romaji data could be more easily formatted into
columns, tables and the like because the characters are all fixed-width.
Regardless of how important either one of these may be in the future, there 
appears to be a large amount of existing programs and data that depend on
the "double-square" display characteristic of romaji. 


    3) The Japanese community appears divided on the issue.  This may actually 
       only be a communication problem, but I have been told both things: that
       they are separate characters not considered to have the same meaning, 
       and they are distinct characters.  I'm not sure how to identify a
       definitive answer to this.  I am sure much of my confusion comes from
       dealing with individual people and/or organizations.

       That all said, I can tell you that my bias is to treat them as having
       the same meaning, and just a different typeface.

I also have received much conflicting information on this issue, but the 
consensus seems to be that when romaji characters are treated "syntactically"
(whatever that means), they should be considered equivalent to the
corresponding ASCII, but when treated as data, they should be processed 
transparently.  I'm sure much of the confusion stems from the fact that in 
Lisp this distinction is quite difficult to make.

   2) Existing practice is inconsistant.
      a) The Kanji Macintosh software I have seen is pretty pathetic; 
         applications suffer from "two byte disease". I cannot yet comment
         coherently on this one because the other poor qualities of MacKanji.
         (Carl Hoffman tells me better software is available).
      b) Japanese word processors I have seen vary in their handling of romaji.
         I have seen them treated as just a spacing variant, and I have seen
         them treated treated very differently on input.  I haven't yet found
         out how they treat them in searches, which I think is the definitive 
         test.  Perhaps after I learn more Japanese...
      c) The Symbolics Japanese support provides for optional canonicalization
         on input.  I believe this is because of conflicting requests, although
         I'm not certain.

Under Linden's equivalence class proposal, whenever a romaji character is read 
in "non-escape" mode, such as when reading a left parenthesis to start a 
list, or when reading a symbol, the character is first canonicalized to its
ASCII equivalent, and then processed.  Thus '( ' is converted to '(' and thus
"inherits" its syntax, and 'a', 'A', 'a ' and 'A ' are all converted to 'A'
in symbols.  When in escape mode, such as when reading strings, romaji
characters are left unchanged.  If the equivalence class is defined as a
static rather than rebindable property of the character set, problems such 
as uncertain symbol-EQness are avoided. 

I'm sure this won't make everyone happy, but it seems to come the closest of 
the proposals I've heard so far.  On the other hand, given the confusion 
surrounding this issue, perhaps waiting for JEIDA's recommendation is the
right thing to do, if it can be obtained prior to the Oct. meeting. 

Dave













∂30-Sep-88  1554	CL-Characters-mailer 	Re: character proposal comments  
Received: from Xerox.COM by SAIL.Stanford.EDU with TCP; 30 Sep 88  15:54:05 PDT
Received: from Cabernet.ms by ArpaGateway.ms ; 30 SEP 88 14:53:09 PDT
Date: 30 Sep 88 14:52 PDT
From: masinter.pa@Xerox.COM
Subject: Re: character proposal comments
In-reply-to: RWK@FUJI.ILA.Dialnet.Symbolics.COM's message of Fri, 30 Sep 88
 02:08 EDT
To: RWK@FUJI.ILA.Dialnet.Symbolics.COM
cc: Dave Unietis <dru@lucid.com>, "X3J13: Character Subcommittee"
 <cl-characters@sail.stanford.edu>
Message-ID: <880930-145309-1143@Xerox>

In Xerox Common Lisp / Medley, SCHAR is slower interpreted, since it
actually checks that its argument is a string. The compiled optimizer
generates the same code as AREF.

Frankly, I think SCHAR is an odd beast -- most other declarations and type
annotations in the language are done with "the" and "declare". 

Maybe it would do as well to do away with SCHAR.  (A purist would eliminate
them all and say "use ELT",  but that's probably going too far.)

My general point is that some of the optimizations that made sense at the
time CLtL was written no longer do, and we might be able to simplify the
language rather than make it more complex.


∂03-Oct-88  1159	CL-Characters-mailer 	subcommittee meeting   
Received: from IBM.COM by SAIL.Stanford.EDU with TCP; 3 Oct 88  11:59:37 PDT
Date: Mon, 03 Oct 88 11:36:22 PDT
From: Thom Linden <baggins@ibm.com>
To: "X3J13: Character Subcommittee" <cl-characters@sail.stanford.edu>
cc: Mike Beckerle <rpk@wheaties.ai.mit.edu>
Message-ID: <881003.113622.baggins@IBM.com>
Subject: subcommittee meeting

Ok. Jan Z. has made arrangements for our subcommittee meeting
on Monday, 10 Oct.  It will be at Contel from 9:30 to 5pm.  I
don't have a room number so you'll have to ask for Mathis at Contel
reception.  I'll be at the Holiday Inn Sunday evening.

Regards and safe travel,
  Thom

∂06-Oct-88  1012	CL-Characters-mailer 	some comments on the proposal    
Received: from decwrl.dec.com by SAIL.Stanford.EDU with TCP; 6 Oct 88  10:12:47 PDT
Received: by decwrl.dec.com (5.54.5/4.7.34)
	id AA03124; Thu, 6 Oct 88 10:11:11 PDT
Date: Thu, 6 Oct 88 10:11:11 PDT
Message-Id: <8810061711.AA03124@decwrl.dec.com>
From: vanroggen%aitg.DEC@decwrl.dec.com
To: cl-characters@sail.stanford.edu
Subject: some comments on the proposal





                                 Comments on
                     "DRAFT: Extensions to Common LISP"
                    "to Support International Character"
                        "Sets" dated 9 September 1988


                                     by

                                 Ron Brender
                        Digital Equipment Corporation
                               6 October 1988


     Overall, I think the approach is  excellent  and  provides  a  good
     foundation for dealing with a variety of character sets in a useful
     and flexible manner.


     I note that I am not expert in the LISP language --  while  I  have
     read   a   lot  about  LISP,  I  have  never  programmed  in  LISP.
     None-the-less, the definitional approach seems quite  clear  and  I
     hope the following comments will be of use.



     1  OBJECTIVES

     (See Section  1.1,  pp3-4;  also  A.13.4,  pp27-28.)  It  might  be
     appropriate  to  note  in  this  introduction  that  the objectives
     intentionally exclude a variety of issues that often come under the
     title "Internationalization".  Such topics are things like date and
     time formats, and the like.  This would not be worth mentioning but
     for  the fact that char-upcase and char-downcase provide operations
     that should be dependent on information outside  of  the  character
     set as such to perform properly.  The standard example is lowercase
     e-acute, which  should  convert  to  uppercase  E-acute  in  French
     French, but to uppercase E without acute in Canadian French.  There
     are many other examples even within the ISO Latin-1 character set.


     In the absence of a more  general  attack  on  internationalization
     issues  (I  don't  recommend  such an effort on the part of LISP at
     this time -- wait for others  to  lead  the  way)  the  meaning  of
     char-upcase  and  char-downcase  for characters outside of the LISP
     standard  character  set  should   be   explicitly   specified   as
     implementation-defined.
!
     Comments on Common LISP Character Sets Proposal              Page 2
     NAMING CHARACTER SETS AND REPERTOIRES                6 October 1988


     2  NAMING CHARACTER SETS AND REPERTOIRES

     (See Section 2.1.) I support the decision to avoid any  attempt  to
     provide   names   for   particular   character  sets  and/or  other
     repertoires at this time.  Yet, it seems clear that portability  of
     applications  would  be  enhanced  if  there  were  an  established
     lexicon.  Moreover, this same issue will surely arise in  one  form
     or  another  in the context of every programming language that adds
     capability for large and/or multiple character sets.


     I suggest X3J13 send a request to, most likely,  ISO-IEC  JTC1/SC22
     (Programming  Languages)  recommending  issues  such  as  this that
     should be  addressed  across  programming  languages,  probably  in
     conjunction  with  SC2 (Character Sets and Controls) and SC21 (Data
     Bases).



     3  EXTERNAL-WIDTH AND FORMAT-EXTERNAL-WIDTH

     (See Section 2.3, p11; also A.22.3.1, p35 [A.22.3.2 is  missing?].)
     The  external-width  function  is  most  appropriate.  Further, the
     sentence observing that this function "does not address the problem
     of  display  width" should be emphasized more strongly, even in the
     absence of proportional fonts.


     Further,  even  the  suggestion  that   the   format-external-width
     variable  is  relevant  to  producing  columnar  output  for (only)
     certain external code formats deserves to be stricken entirely from
     this  discussion.   If  the production of simple columnar output is
     worthwhile -- and I think it is -- then I urge  that  X3J13  search
     for a means to achieve this that is independent of artifacts of the
     external  representation.   The  suggested  approach   happens   to
     more-or-less   work   at  the  moment  with  many  common  external
     representations, but this is less unlikely to continue to  be  true
     in  the  future  --  in  particular,  as  the  Multiple-Octet Coded
     Character Set being defined by ISO-IEC JTC1/SC2/WG2 comes into  use
     (an ISO DP is expected out by the end of this year (1988)).


     Since assisting with columnar output seems to be the  sole  purpose
     of  this variable, I recommend it be withdrawn completely from this
     proposal.

∂06-Oct-88  1054	CL-Characters-mailer 	Re: character proposal comments  
Received: from Xerox.COM by SAIL.Stanford.EDU with TCP; 6 Oct 88  10:54:15 PDT
Received: from Semillon.ms by ArpaGateway.ms ; 06 OCT 88 10:41:23 PDT
Date: 6 Oct 88 10:41 PDT
From: masinter.pa@Xerox.COM
Subject: Re: character proposal comments
In-reply-to: RWK@FUJI.ILA.Dialnet.Symbolics.COM's message of Fri, 30 Sep 88
 02:08 EDT
To: "X3J13: Character Subcommittee" <cl-characters@sail.stanford.edu>
Message-ID: <881006-104123-2945@Xerox>

As far as I can tell, none of the standards committees working on character
identification within either X3 or ISO are working on an encoding where
there will be more than one code for the same semantic character
identifier, as is in JIS. 

While the current JIS standard has second versions of many ASCII
characters, it would seem inappropriate to bend the X3J13 standard to
support a feature which is not consistent with the other X3 or ISO
standards in preparation. 

We should distinguish what might be the right technical decision for a
particular implementation from what is the right design for a national and
international standard; especially if the standard can accomodate
JIS-compatible extensions.

I've recently heard from the Xerox representative to X3L2 that they're also
working on character encoding schemes in conjunction with ISO JTC1 SC2 WG2.
I don't know the exact protocol here, but if X3J13 is to establish a
liason, should it not be with the other ANSC X3 committees?

∂06-Oct-88  1136	CL-Characters-mailer 	Comments on ANSI Draft 
Received: from RELAY.CS.NET by SAIL.Stanford.EDU with TCP; 6 Oct 88  11:36:10 PDT
Received: from relay2.cs.net by RELAY.CS.NET id aa15490; 6 Oct 88 12:58 EDT
Received: from utokyo-relay by RELAY.CS.NET id at23549; 6 Oct 88 12:33 EDT
Received: by ccut.cc.u-tokyo.junet (5.51/6.3Junet-1.0/CSNET-JUNET)
	id AA08072; Thu, 6 Oct 88 17:46:00 JST
Received: by nttlab.ntt.jp (3.2/6.2NTT.h) with TCP; Thu, 6 Oct 88 15:58:45 JST
Received: by tutics.tut.junet (ver3.3/6.2J/systemV)
        id AA04832; Thu,  6 Oct 88 11:06:50 jst
Message-Id: <8810061106.AA04832@tutics.tut.junet>
Date: Thu,  6 Oct 88 11:06:50 jst
From: Taiichi Yuasa <yuasa%tutics.tut.junet@UTOKYO-RELAY.CSNET>
To: cl-characters@SAIL.STANFORD.EDU
Subject: Comments on ANSI Draft
Cc: baggins@IBM.COM, mathis@b.isi.edu



Comments on

"DRAFT: Extensions to Common LISP to Support International Character Sets"

from the Character Set Subcommittee of Japanese SC22/LISP WG.
(compiled by Taiichi Yuasa, secretary of SC22/LISP WG, 05 OCT 88)


(a) Some technical terms are not clear.  For instance,

	1. The notion of "character set" is not defined.
	   This term appears several times in the draft:
		"more than one character set" (page 6, line 16)
		"multiple character sets" (page 6, line 17)
	   Does it mean "character repertoire" or "coded character set"?
	   This distinction must be clear because there may be
	   multiple coded character sets for a single character repertoire,
	   which is our case in Japan.

	2. It is not clear What "an implementation SUPPORTs a character set"
	   means.
	   We ourselves have discussed what "support" means but have not
	   found any reasonable definition yet.

	3. The sentence "it must define the sets supported and their 
	   characteristics" (page 6, line 17) is quite vague.

(b) The relation among the string types is not clear.
    We guessed
	SIMPLE-BASE-STRING is a subtype of BASE-STRING, and
	BASE-STRING is a subtype of MOST-GENERAL-STRING,
    but we are not sure.
    Also, we guessed
	MOST-GENERAL-STRING is identical to STRING,
    but then what is the role of the name MOST-GENERAL-STRING?
    We do not know whether the draft suggests the possibility that
    there are some strings that are not MOST-GENERAL-STRING.

(c) The draft specification leaves too many things unspecified.
    We wonder if the specification will increase the international portability
    of application programs.
    Most of us would like to include Kanji characters in BASE-CHARACTER but
    some of us would rather like to put them in EXTENDED-CHARACTER.
    We found no description on this issue in the draft.

(d) We need some mechanism for syntactic equivalency among characters, such
    as the one proposed by Thom Linden.  We are wondering why such an
    important mechanism is not included in the draft.


∂06-Oct-88  1506	CL-Characters-mailer 	characters comments    
Received: from ti.com by SAIL.Stanford.EDU with TCP; 6 Oct 88  15:06:00 PDT
Received: by ti.com id AA20944; Thu, 6 Oct 88 17:04:42 CDT
Received: from Kelvin by tilde id AA27159; Thu, 6 Oct 88 16:52:12 CDT
Message-Id: <2801166760-10311358@Kelvin>
Sender: GRAY@Kelvin.csc.ti.com
Date: Thu, 6 Oct 88  16:52:40 CDT
From: David N Gray <Gray@DSG.csc.ti.com>
To: CL-Characters@SAIL.Stanford.edu
Cc: Bartley@MIPS.csc.ti.com
Subject: characters comments

Following are a few things that confused me about the proposal for
"Extensions to Common Lisp to Support International Character Sets"
(dated 9/9/88):

The semi-standard characters have been deleted without any reason given.

There doesn't seem to be any way to find out which repertoire a given
character object belongs to.

There doesn't seem to be any way to construct a character object for a
particular code and repertoire.

It is not clear what the meaning of CHAR-CODE-LIMIT is now.  Does the
char code include identification of the repertoire?  If so, it would
seem to be of little use.  If not, then wouldn't the maximum code value
be different for different repertoires?  If the code size is not able to
be different for different repertoires, then I don't see how the concept
of repertoires needs to be different from the old concept of font
numbers.

Does it really help to define standard keywords :CHARACTER-SET and
:EXTERNAL-CODE-FORMAT for the OPEN function if there are no standard
values for them?   Also, if these are not specified when opening for
input, instead of specifying that the default is the base character set,
should permit defaulting from what the file system knows about how the
file was written.

∂07-Oct-88  0844	CL-Characters-mailer 	Symbolics comments on the Characters subcommittee report  
Received: from STONY-BROOK.SCRC.Symbolics.COM (SCRC-STONY-BROOK.ARPA) by SAIL.Stanford.EDU with TCP; 7 Oct 88  08:44:23 PDT
Received: from EUPHRATES.SCRC.Symbolics.COM by STONY-BROOK.SCRC.Symbolics.COM via CHAOS with CHAOS-MAIL id 472522; Fri 7-Oct-88 11:43:08 EDT
Date: Fri, 7 Oct 88 11:42 EDT
From: David A. Moon <Moon@STONY-BROOK.SCRC.Symbolics.COM>
Subject: Symbolics comments on the Characters subcommittee report
To: CL-Characters@sail.stanford.edu
Message-ID: <19881007154235.1.MOON@EUPHRATES.SCRC.Symbolics.COM>

Comments from Symbolics on "DRAFT: Extensions to Common Lisp
to Support International Character Sets", dated Sep 9, 1988

OVERALL COMMENT

In general we agree with this proposal, but there are some defects
in it that need to be remedied before it can be acceptable.  The
proposal is really not ready yet for voting.


MAJOR COMMENTS

* Pages 6 and 18 call for the meaning of the STRING-CHAR type specifier
to be incompatibly changed in the name of compatibility.  We oppose this.
Compatibility would be much easier to achieve by eliminating STRING-CHAR
from the language, allowing a user or an implementation to define it
with DEFTYPE to be whatever they require for compatibility.  (This would
leave (DECLARE (STRING-CHAR x)) undefined, unless an implementation added
it, since there is no way for a user to add declarations.)

* Page 11 says that (write-char #\newline stream) is no longer equivalent
to (terpri stream).  This directly contradicts the last paragraph of CLtL
p.22, which this proposal does not amend.  We can see no justification for
this incompatible change; outputting a newline character should remain
equivalent to calling the terpri function.  The fact that many external
character encoding schemes treat newline as a special case applies equally
to the newline character and the terpri function and does not justify
changing them to be non-equivalent.

* Pages 11 and 34-5:  The EXTERNAL-WIDTH function and FORMAT features are
much less well thought-out than the rest of the proposal, are described in
a self-contradictory way, and are unrelated to the main topic of this
proposal.  They should be removed, and proposed separately when they have
been more carefully thought out.  We could offer more detailed criticisms,
but that doesn't seem useful at this time.  By the way, the Cleanup
committee issue STREAM-INFO appears to cover the same ground.

* Page 21 uses a type-specifier list (character :standard) in an example
but there is no definition of what this means nor what the valid syntax is.

* Pages 6, 23, and 25 mandate that CHAR-EQUAL is unaffected by all
implementation-defined character attributes.  This is not an acceptable
generalization; the effect, if any, on CHAR-EQUAL of each
implementation-defined character attribute has to be specified as part of
the definition of that attribute.  Symbolics Genera, for example, has one
implementation-defined character attribute that definitely should affect
CHAR-EQUAL and another that definitely should not.


MINOR COMMENTS (not so minor that they can be ignored!)

The introduction makes no mention of extended typesetting symbols, such as
accent marks and the copyright and trademark symbols.  If Lisp is to be
used for real-world applications, these are necessary.

Page 10 refers to the representation of coded character sets as keyword
symbols.  Why not use CLOS objects?  There might be reasons, but you should
state them.  Also there should be a portable way to refer to the base
character set.  In general the language representation of character sets
and of character repertoires is very poorly specified and the proposal
needs to be extended to cover this.

Pages 11, 36, 37: There are several problems with OPEN options:

 The default value of the :EXTERNAL-CODE-FORMAT argument to OPEN should be
 implementation-defined rather than required to be the "natural" encoding
 (whatever that is).  The only requirement should be that it be able to
 encode the base character set.  It should not be restricted from encoding
 other character sets also.  There should be a name for this default value,
 probably :DEFAULT.

 There should be a name for the "natural" encoding and there should be a
 specification of the properties of the natural encoding that a programmer
 can rely on.  Suggestions for the name include :BASE, :NATURAL, and
 :INTERCHANGE.  The definition probably involves the concept of data
 interchange with non-Lisp programs on the same system.

 There should be names for standard encodings such as ASCII to allow
 data interchange between differing systems.

 There should be a defined value for the :CHARACTER-SET option that
 specifies all characters that the Lisp implementation can represent.  OPEN
 should signal an error if this :CHARACTER-SET option is used together with
 an :EXTERNAL-CODE-FORMAT option that cannot encode all the characters the
 Lisp implementation can represent.  Without this, there is no way to write
 a correct program that stores arbitrary strings in a file.

 The default value of the :ELEMENT-TYPE argument should be an
 implementation-defined subtype of CHARACTER that can be a supertype of
 BASE-CHARACTER, rather than specified to be exactly BASE-CHARACTER.

 It's hard to understand why both :CHARACTER-SET and :ELEMENT-TYPE exist,
 since they appear to control the same thing.  It would be best to remove
 :CHARACTER-SET and make sure that type-specifiers are expressive enough
 to allow :ELEMENT-TYPE to do everything that :CHARACTER-SET could do.
 The only justification for a separate :CHARACTER-SET option that can be
 inferred from the proposal is that :EXTERNAL-CODE-FORMAT :SHIFT-DELIMITED
 needs an -ordered- pair of character sets; this would be more appropriately
 specified as a list :EXTERNAL-CODE-FORMAT (:SHIFT-DELIMITED cs1 cs2).

 The guarantee on page 11 that input operations will never return characters
 outside the character sets mentioned in the :CHARACTER-SET option should
 be removed.  It seems wrong to require more checking in input functions
 than in output functions.  The :EXTERNAL-CODE-FORMAT might be capable
 of representing more characters than the :CHARACTER-SET option specifies.

 Are the external code format names listed on page 37 a proposal for
 standardized names, or merely illustrative examples?

 The motivations for the above comments are:
   - provide standard names for all portable concepts
   - allow, but not require, implementations to make it easy to write
     programs that work with multiple character sets without special effort
   - put the specification of the internal representation of characters
     in one and only one place in the options to OPEN
   - put the specification of the external representation of characters
     in one and only one place in the options to OPEN

Page 16 (referring to paragraph 6) implies that Space is not a graphic
character, but page 24 (referring to paragraph 6) implies that Space is
a graphic character.  CLtL p.235 says Space is graphic, let's stick with
that.

Pages 19 and 20 introduce a new type named simple-base-string, in addition
to simple-string.  If you think about how simple-string would be used for
compiler optimization, it makes sense for simple-string to be the name for
the single simplest representation, rather than a name for a whole family
of representations that would have to be discriminated at run time.  Thus
what you call simple-base-string should be called simple-string, and what
you call simple-string should just be called (simple-array character (*)).
This would not be an incompatible change in the meaning of simple-string.
Simple-string would be analogous to simple-vector.

Page 20 proposes to change (COERCE <integer> 'CHARACTER) incompatibly to be
synonymous with CODE-CHAR instead of INT-CHAR.  This change seems
unmotivated.  We would rather delete coercion from integers to characters
entirely, for the same reason that coercion from characters to integers is
not permitted.

Page 23 proposes an equivalence of CHAR-INT and CHAR-CODE, and of INT-CHAR
and CODE-CHAR.  This is unnecessary and should be removed.

The last bullet on page 23 should be removed.  Part of the definition of
each implementation-defined character attribute must be whether or not that
attribute is removed from symbol names by READ.  Also the phrase "symbol
construction" is ambiguous (does it mean READ or INTERN or MAKE-SYMBOL?)
and should be avoided.

Page 30 (referring to paragraph 24) and page 31 (referring to paragraph 2)
amend MAKE-SEQUENCE and MAKE-STRING.  There are several problems:  It fails
to make (MAKE-SEQUENCE 'STRING n) equivalent to (MAKE-STRING n), including
handling of the presence or absence of the :INITIAL-ELEMENT option.  It
fails to specify the default for the :ELEMENT-TYPE argument to MAKE-STRING.
Earlier there was much controversy about whether by default strings should
be base or extended, so it's really unfortunate that the proposal fails to
take any stand on this issue.  We propose that (MAKE-STRING n) and
(MAKE-SEQUENCE 'STRING n) return a base-string by default.  When the
:INITIAL-ELEMENT option is specified, they return the most specialized
type that can accomodate that character.


EDITORIAL COMMENTS

Shouldn't there be a reference to relevant ISO document(s) in the
bibliography?

The format of the later portion of the proposal, referring to locations
in CLtL by numbering paragraphs, is hard to follow.  It would help to
mention a page number and a function name.  In general, it is preferable
to propose what the Common Lisp language should be rather than to propose
how Guy Steele's book should be altered.

The page 14 description of the standard character subrepertoire needs an
example.  There is an obvious candidate, namely $.  The ISO character #o044
is a currency sign.  Many ASCII terminals overseas have a glyph other than
dollar sign for this (e.g. Pound Sterling or Yen).

Page 15's table appears to contain some typographical errors (LV22, LX22,
the glyph for capital J is K) so we don't trust the table at all.  Also,
what are these IDs?  They don't appear anywhere else in the proposal.

∂25-Oct-88  1021	CL-Characters-mailer 	cs proposal  
Received: from IBM.COM by SAIL.Stanford.EDU with TCP; 25 Oct 88  10:20:51 PDT
Date: Tue, 25 Oct 88 09:51:42 PDT
From: Thom Linden <baggins@ibm.com>
To: "X3J13: Character Subcommittee" <cl-characters@sail.stanford.edu>
cc: yuasa%tutics.tut.junet%utokyo-relay.csnet@relay.cs.net,
    rpk@wheaties.ai.mit.edu, brent%hpfcpsb@hplabs.hp.com,
    sandra%defun@cs.utah.edu, vanroggen%atig.dec@decwrl.dec.com,
    moon@stony-brook.scrc.symbolics.com
Message-ID: <881025.095142.baggins@IBM.com>
Subject: cs proposal

  I would like to express our thanks for the various comments received
regarding the characters proposal.  The comments were very encouraging
but pointed out the need for further improvements in the document.
We (Bob Kerns, Jerry Duggan, Dave Unietis, Paul Beiser and I)
spent a marathon 15 hr subcommittee session last Monday at Wash DC
going over all your suggestions.  I also thank Jerry Duggan (hp) and
Dave Unietis (lucid) for joining the subcommittee and their
contributions.

Given the need for revision, we did not request X3J13 to vote on the
DRAFT:...  proposal.  Instead, a vote will be scheduled for the Jan
meeting on the revised document.
J13 did vote to submit the DRAFT: ... proposal along with
comments received and responses to ISO as a working contribution.
J13 also voted (as gray recommended) to express a requirement to
ISO and ANSI for standardized repetoire and character set names.

A revised proposal (with DRAFT removed!) will be ready in by mid-Nov.
I'm going to append to this forum my (short) summary of the subcommittee
responses to your comments.
Please let me know if we (or I) have missed any of your points.

Regards,
  Thom

∂25-Oct-88  1122	CL-Characters-mailer 	Re: cs proposal   
Received: from Xerox.COM by SAIL.Stanford.EDU with TCP; 25 Oct 88  11:22:31 PDT
Received: from Semillon.ms by ArpaGateway.ms ; 25 OCT 88 11:03:47 PDT
Date: 25 Oct 88 11:03 PDT
From: masinter.pa@Xerox.COM
Subject: Re: cs proposal
In-reply-to: Thom Linden <baggins@ibm.com>'s message of Tue, 25 Oct 88
 09:51:42 PDT
To: Thom Linden <baggins@ibm.com>
cc: "X3J13: Character Subcommittee" <cl-characters@sail.stanford.edu>,
 yuasa%tutics.tut.junet%utokyo-relay.CSNet@relay.cs.net,
 rpk@wheaties.ai.mit.edu, brent%hpfcpsb@hplabs.hp.com,
 sandra%defun@cs.utah.edu, vanroggen%atig.dec@decwrl.dec.com,
 moon@stony-brook.scrc.symbolics.com
cc: cl-cleanup@sail.stanford.edu
Message-ID: <881025-110347-11299@Xerox>

Thom:

The proposal did not ever say explicitly, and I feel strongly that it
should, that in fact Common Lisp *requires* absolutely no changes in order
to support extended character sets. The language as specified in CLtL is
entirely adequate to allow handling of multiple, international character
sets. The implementation of Xerox Common Lisp, now available on Xerox 1100
series workstations and Sun 3 and Sun 4 workstations, is an existance
proof.

There is a price, however, that implementations must pay in order to use
CltL unchanged. The price can either be in terms of space -- reserving
enough bits per character in a string -- or in terms of speed, in
implementations in which not all strings are displacable. (Briefly, the
implementation technique is to allocate a smaller number of bits per
character than the maximum in most strings, but allow for strings to be
displaced.)

Thus, the only changes to the language that can be justified from the point
of view of allowing support of International Character Sets are those that
have an arguably more efficient implementation.  The change to the type
hierarchy, the modification of the STRING type from an abbreviation of
(VECTOR STRING-CHAR) to an indefinite union of types, and the various
changes associated with that to STRINGP etc. should be justified by some
explicit rationale as to the efficiency of the implementations under that
regime.

The Character Proposal includes several other enhancements and
modifications which are probably good ideas, but which require separate
discussion and justification. Removing CHAR-FONT is a good idea, because
the feature is not used. Removing CHAR-BITS is probably a good idea,
because the feature is not used widely, and was (perhaps) based on a design
decision which confounded keystrokes with characters and which allowed for
"characters" which could not be held in "strings". These changes can and
should be justified completely independently of any notion of
"international character set handling".

Extending the "CHARACTER" type specifier to have a list form is probably a
good idea, although it does not conform to any current practice, as it
allows a single existing mechanism to provide what is otherwise an
overambitious proliferation of character-predicate functions in an easily
extensible manner. (Implementations that do not support hiragana can easily
support (typep x '(character :hiragana)) == NIL.) This is related to the
support of international character sets, but hardly required. Certainly
this does not go far enough to allow portable programs to be written. For
one small example, the proposal is silent about what char-upcase and
char-downcase do on greek or cyrillic characters, or accented characters in
european alphabets. Would not portable language manipulation programs would
need portable definitions of these? Is there some reason in principle for
not agreeing on the operation of CHAR-UPCASE? Are hiragana characters
ALPHA-CHAR-P? Etc.

The discussion of the proposal frequently confounds two separate
distinctions, of "backward compatibility" and "portability". Our primary
goal is to allow "portable" programs -- programs that, if written in the
standard language, will run unchanged in all implementations that support
the standard language.  We try to achieve that while also supporting
"backward compatibility" -- programs that run in current implementations of
Common Lisp should continue to work correctly unchanged. I fear that the
proposal, in the name of efficiency and backward compatibility,  damages
the portability of the resulting language, because it allows programs to
rely on implementation-dependent details of the nature of strings. It
damages backward compatibility, because valid programs that manipulated
strings will no longer be correct. And it does not make a convincing
argument that it has actually solved the problem of *efficient*
international character set handling.

I think you've done an admirable job of establishing the scope and extent
of possible changes to Common Lisp in the area of character handling.
However, I would like to see the proposal split up into separate "issues"
which each have their own pros and cons. I would like to see this happen so
that the cleanup committee doesn't have to clean up after the character
committee is done. 

Sincerely,

Larry Masinter


∂26-Oct-88  1724	CL-Characters-mailer 	Re: cs proposal   
Received: from STONY-BROOK.SCRC.Symbolics.COM (SCRC-STONY-BROOK.ARPA) by SAIL.Stanford.EDU with TCP; 26 Oct 88  17:24:26 PDT
Received: from EUPHRATES.SCRC.Symbolics.COM by STONY-BROOK.SCRC.Symbolics.COM via CHAOS with CHAOS-MAIL id 482647; Wed 26-Oct-88 20:19:31 EDT
Date: Wed, 26 Oct 88 20:19 EDT
From: David A. Moon <Moon@STONY-BROOK.SCRC.Symbolics.COM>
Subject: Re: cs proposal
To: masinter.pa@Xerox.COM
cc: Thom Linden <baggins@ibm.com>, "X3J13: Character Subcommittee" <cl-characters@sail.stanford.edu>,
    yuasa%tutics.tut.junet%utokyo-relay.CSNet@relay.cs.net, rpk@wheaties.ai.mit.edu,
    brent%hpfcpsb@hplabs.hp.com, sandra%defun@cs.utah.edu, vanroggen%atig.dec@decwrl.dec.com,
    cl-cleanup@sail.stanford.edu
In-Reply-To: <881025-110347-11299@Xerox>
Message-ID: <19881027001914.5.MOON@EUPHRATES.SCRC.Symbolics.COM>
Line-fold: No

    Date: 25 Oct 88 11:03 PDT
    From: masinter.pa@Xerox.COM

    The proposal did not ever say explicitly, and I feel strongly that it
    should, that in fact Common Lisp *requires* absolutely no changes in order
    to support extended character sets. The language as specified in CLtL is
    entirely adequate to allow handling of multiple, international character
    sets. The implementation of Xerox Common Lisp, now available on Xerox 1100
    series workstations and Sun 3 and Sun 4 workstations, is an existance
    proof.

So is Symbolics Genera.  However, I doubt that programs and data files
written to exploit extended character sets are portable between
Symbolics and Xerox (Envos).  For programs, the primitives defined by
Common Lisp are insufficient for meaningful manipulation of multiple
character sets.  For data files, Common Lisp says nothing about how
characters are represented.

Isn't portability between implementations the whole reason for additional
standardization in this area?

Most of your comments are consistent with the comments that Symbolics
sent in, I think.  Also your desire for better specification of such
things as what alphabetic case means in international character sets is
consistent with comments from International Lisp Associates that I saw.

    The discussion of the proposal frequently confounds two separate
    distinctions, of "backward compatibility" and "portability". Our primary
    goal is to allow "portable" programs -- programs that, if written in the
    standard language, will run unchanged in all implementations that support
    the standard language.  We try to achieve that while also supporting
    "backward compatibility" -- programs that run in current implementations of
    Common Lisp should continue to work correctly unchanged. I fear that the
    proposal, in the name of efficiency and backward compatibility,  damages
    the portability of the resulting language, because it allows programs to
    rely on implementation-dependent details of the nature of strings. It
    damages backward compatibility, because valid programs that manipulated
    strings will no longer be correct. And it does not make a convincing
    argument that it has actually solved the problem of *efficient*
    international character set handling.

I'd like to see some more detail on these allegations, particular the
one about allowing programs to rely on implementation-dependent details,
since unlike your other comments I don't agree with them, or maybe I
don't know what exactly you're saying.

    I think you've done an admirable job of establishing the scope and extent
    of possible changes to Common Lisp in the area of character handling.
    However, I would like to see the proposal split up into separate "issues"
    which each have their own pros and cons.

I think this is a good idea for issues that are truly separable, such as
removing char-bits, but splitting up the kernel of the proposal seems to
me to be just more work for both writers and readers, so I'd rather
see the main proposal remain all in one piece.

∂27-Oct-88  1159	CL-Characters-mailer 	Re: cs proposal   
Received: from Xerox.COM by SAIL.Stanford.EDU with TCP; 27 Oct 88  11:58:47 PDT
Received: from Semillon.ms by ArpaGateway.ms ; 27 OCT 88 11:03:48 PDT
Date: 27 Oct 88 11:03 PDT
From: masinter.pa@Xerox.COM
Subject: Re: cs proposal
In-reply-to: David A. Moon <Moon@STONY-BROOK.SCRC.Symbolics.COM>'s message
 of Wed, 26 Oct 88 20:19 EDT
To: David A. Moon <Moon@STONY-BROOK.SCRC.Symbolics.COM>
cc: cl-characters@sail.stanford.edu
Message-ID: <881027-110348-15668@Xerox>

Your point about portability is interesting, because I had not thought
there was a portability issue between implementations that currently
conformed to CLtL.

Of the two aspects, programs and data files, I think only "programs" is the
issue for X3J13. The various standardization bodies across the world are
trying with varying degrees of success to deal with data format
compatibility. Envos Medley normally writes out files using the Xerox
Character Standard encoding. I imagine that it would be possible to write a
program in any language which, using binary I/O, could convert those data
files to and from whatever representation Symbolics uses for representing
files. I imagine we could even fix the implementation to read and write
Symbolics data files as well as Xerox data files, or Apple or JIS-coded
Unix. Frequently, the "file transfer" mechanism by which data files get
transported from one computer system to another incorporate data format
transformation algorithms as well, e.g., to convert between EBCDIC and
ASCII; I imagine this mechanism could be extended to cover the
transformation between other character encoding mechanisms, at least for
files of "straight" text. These transformation mechanisms should be
adequate for dealing with Common Lisp source program files as well, right?

On the issue of portability of programs, I had assumed that programs
written for Symbolics machines that dealt with, say, mixed Kanji and Roman
characters, could in fact be run unchanged in Medley. I think I understand
the Symbolics mechanism for dealing with international characters, and I
didn't see any portability problems following current CLtL. I really
thought the issue was only that some implementations really distingiushed
between simple and displacable strings.

Since you "doubt that programs and data files written to explit extended
character sets are portable..." perhaps you might be able to construct a
small example that illustrates this point? 

Briefly, the Medley implementation is such that CHAR-CODE-LIMIT is 65536,
all "extended" characters just have codes above 255. The implementation
hides any visible distinction between strings that have 8-bits per
character and those that have 16, i.e., if you try to store (int-char 1234)
into a string that started out with only 8-bits per character, it quietly
displaces the string to one with 16-bits per character. The external format
uses run-coding to allow ASCII files to be used unchanged and compress the
output; the transformation between external format and internal is handled
by READ-CHAR and WRITE-CHAR (and their internal equivalents used by READ.)

You made several other good points in response to my message, which I will
reply to, but I wanted to get straight on this one first, since there seems
to be some divergence on the issue of portability of the current CLtL
definition.

Thanks,

Larry

∂27-Oct-88  1751	CL-Characters-mailer 	ANSI and ISO Committee liasons   
Received: from Xerox.COM by SAIL.Stanford.EDU with TCP; 27 Oct 88  17:51:44 PDT
Received: from Semillon.ms by ArpaGateway.ms ; 27 OCT 88 14:18:14 PDT
Date: 27 Oct 88 14:18 PDT
From: masinter.pa@Xerox.COM
Subject: ANSI and ISO Committee liasons
To: cl-characters@sail.stanford.edu
Message-ID: <881027-141814-1155@Xerox>

The character encoding committees that X3J13 should establish liason with
should include X3L2. X3L2 is establishing encodings for international
character sets. X3L2 is preparing a US position for the ISO group ISO JTC1
SC2 WG2. I imagine that ISO JTC SC22 WG16 should ask for a liason with ISO
JTC1 SC2 WG2 also.


"ISO has apparently reserved the number 10646 for this standard. (analogous
to ISO 646, the fundamental seven bit character code standard)"

I noticed in Thom's report that the recommendation was to establish some
liasons with some of the ANSI and ISO committees on character exchange,
however I thought the committee numbers were different. Did I
misunderstand?

∂29-Oct-88  1232	CL-Characters-mailer 	cs proposal comments   
Received: from IBM.COM by SAIL.Stanford.EDU with TCP; 29 Oct 88  12:32:36 PDT
Date: Sat, 29 Oct 88 11:43:34 PDT
From: Thom Linden <baggins@ibm.com>
To: "Sandra Loosemore" <sandra%defun@cs.utah.edu>
cc: "X3J13: Character Subcommittee" <cl-characters@sail.stanford.edu>
Message-ID: <881029.114334.baggins@IBM.com>
Subject: cs proposal comments

Sandra,
  Thanks for your comments.  We will include your suggested change
in the revised proposal.


>>    The only thing I really found confusing about this proposal was the
>>     elimination of the semi-standard characters and section 2.2.3 on
>>     non-standard characters.  With that gone, it is not left entirely
>>     clear that an implementation may support other named characters
>>     besides #\space and #\newline, until the reader gets to chapter 14 and
>>     the description of CHAR-NAME, where we are told that all non-graphic
>>     characters have names.  I really think that chapter 2 ought to include
>>     something to the effect that an implementation can support named
>>     characters that are not in the standard character set.

∂29-Oct-88  1232	CL-Characters-mailer 	JIS comments on proposal    
Received: from IBM.COM by SAIL.Stanford.EDU with TCP; 29 Oct 88  12:31:49 PDT
Date: Sat, 29 Oct 88 11:33:48 PDT
From: Thom Linden <baggins@ibm.com>
To: "Dr. Taiichi Yuasa"
    <yuasa%tutics.tut.junet%utokyo-relay.csnet@relay.cs.net>
cc: "X3J13: Character Subcommittee" <cl-characters@sail.stanford.edu>
Message-ID: <881029.113348.baggins@IBM.com>
Subject: JIS comments on proposal

Dr. Yuasa,

  Thank you for the comments on the characters proposal.  These and
other comments have encouraged us to further improve the document.

A revised proposal (with DRAFT removed!) will be ready in Nov.

  Please note the comment below that we encourage JIS to propose
an equivalency mechanism to be incorporated into this proposal.


==========================================================

>>     (a) Some technical terms are not clear.  For instance,
>>
>>         1. The notion of "character set" is not defined.
>>            This term appears several times in the draft:
>>             "more than one character set" (page 6, line 16)
>>             "multiple character sets" (page 6, line 17)
>>            Does it mean "character repertoire" or "coded character set"?
>>            This distinction must be clear because there may be
>>            multiple coded character sets for a single character repertoire,
>>            which is our case in Japan.

  The term 'character set' was intended to uniformly mean 'coded
character set'.  This abbreviation won't be used in the revision.
Let me try with the JIS example:

        The (glyphs seen in JIS0208) repetoire which defines a
          unique meaning to each glyph.  This repetoire consists
          of the following subrepetoires:
              Special Symbols subrepetoire
              Latin subrepetoire
              Greek    subrepetoire
              Cyrillic subrepetoire
              Hiragana subrepetoire
              Katakana subrepetoire
              Kanji    subrepetoire

       JIS0208 is a coded character set which defines a unique
         code to each glyph within the repetiore.  (Note that
         it is not made up of separate coded character sets;
         JIS0208 defines a mapping for the repetoire in total.)

       Now, one of the difficulties is that many implementations
       support both a JIS0208 coded character set as well as variants
       of the ISO646 coded character set (the corresponding ISO646
       repetoire also contains a Latin subrepetoire).
       Files often contain mixtures of both coded character sets
       and display characteristics are often (always?) different
       for each coded character set.  For example, the JIS0208
       encoded Latin glyphs may have a wider display width than
       the ISO646 encoded Latin counterpart.  We define Common
       Lisp character as being uniquely identified by their codes.
       The Common Lisp internal code representation is left
       undefined but it is a defined mapping of external encodings to
       a single uniform internal coding.  This mapping is
       performed by the I/O mechanisms.

       An implementation is free to decide on the external/internal
       mapping.  Whether the two encodings for the Latin glyph 'A'
       are treated as distinct Lisp characters or are mapped by
       the I/O into a single Lisp character is up to the implementation.
       (Of course, if mapped into a single Lisp character it is
       a one-way mapping).  This is precisely where the equivalency
       mechanism (mentioned below) comes up.  It appears that
       there is no 'right' answer for this implementation choice.
       Because of display and file environments, users may require
       this be an application choice, not the implementations!

>>
>>         2. It is not clear What "an implementation SUPPORTs a character set"
>>            means.
>>            We ourselves have discussed what "support" means but have not
>>            found any reasonable definition yet.
>>
>>         3. The sentence "it must define the sets supported and their
>>            characteristics" (page 6, line 17) is quite vague.

     An implemenation must have an internal encoding and
at least one external-code-format, it must
define the effect of alpha-p, uppercase-p, etc, and I/O keywords.
  The revision will include a list of these 'support' items.

>>
>>     (b) The relation among the string types is not clear.
>>         We guessed
>>         SIMPLE-BASE-STRING is a subtype of BASE-STRING, and
>>         BASE-STRING is a subtype of MOST-GENERAL-STRING,
>>         but we are not sure.
>>         Also, we guessed
>>         MOST-GENERAL-STRING is identical to STRING,
>>         but then what is the role of the name MOST-GENERAL-STRING?
>>         We do not know whether the draft suggests the possibility that
>>         there are some strings that are not MOST-GENERAL-STRING.

  Simple-base-string is a subtype of base-string.
  Base-string is a subtype of string.  (and can hold only elements
       of type base-character)

  Most-general-string is a subtype of string. (and can hold elements
      of type character of any subtype of character)

  String is eshaustively separated into a set of strings of form:
                  (array  specialized to hold x)

  Thus Most-general-string is not equivalent to string.  This is why
the functions which take a string type specifier became ambiguous.

  An implementation may provide string subtypes other than
these 'standard' ones.  The example in the proposal for
region-specialized-string is a sample of what might be provided
and is not a 'standard' type.

  The revised document will add to the definitions and present a
clear description.

>>
>>     (c) The draft specification leaves too many things unspecified.
>>         We wonder if the specification will increase the international portability
>>         of application programs.
>>         Most of us would like to include Kanji characters in BASE-CHARACTER but
>>         some of us would rather like to put them in EXTENDED-CHARACTER.
>>         We found no description on this issue in the draft.

  We will add a Kanji example to p9. and some more rational for the
structure proposed.  The type structure has a penalty of complexity but
our goal is to encourage implementations to provide this support.
The structure allows implementations the flexibility to provide
the support in an efficient manner based on their underlying hardware.
  The users do not suffer since they are able to write code which is
portable across various implementations.
  The two alternatives you mention are valid implementation choices.
User code is portable between both variants.

>>
>>     (d) We need some mechanism for syntactic equivalency among characters, such
>>         as the one proposed by Thom Linden.  We are wondering why such an
>>         important mechanism is not included in the draft.


  We believe this is an important issue also but haven't any direct
user or implementation experience at this point.  The subcommittee
would like to request JIS to propose the equivalency mechanism
their experience indicates is needed.  We would like to include
a JIS supported equivalency mechanism in the revision if possible.


∂29-Oct-88  1928	CL-Characters-mailer 	cs proposal comments   
Received: from IBM.COM by SAIL.Stanford.EDU with TCP; 29 Oct 88  19:27:58 PDT
Date: Sat, 29 Oct 88 19:06:12 PDT
From: Thom Linden <baggins@ibm.com>
To: "Larry Masinter" <Masinter.pa@xerox.com>
cc: "X3J13: Character Subcommittee" <cl-characters@sail.stanford.edu>
Message-ID: <881029.190612.baggins@IBM.com>
Subject: cs proposal comments

----------------------------------------------------------------

Larry:  This should be copied to David Gray at TI but my mailer
doesn't acknowledge that domain.  Can you forward the note?  Thanks.
       Gray@DSG.csc.ti.com

----------------------------------------------------------

David,  thanks for the suggestions.  We'll be revising the document
to incorporate clarification and/or new function during November.

>>    The semi-standard characters have been deleted without any reason given.

The fact that these are not seen as a generally useful or common
set of characters in existing implementations will be added to the
revision.

>>
>>    There doesn't seem to be any way to find out which repertoire a given
>>    character object belongs to.

The problem here is a character object may be a member of several
repetoires.  We have added a global name *all-repetoire-names*
which is a list of all repetoires the implementation supports.  At
a minimus it will contain :BASE and :STANDARD.  Thus it will be
possible (using characterp) to determine the repetoire(s) to which
an object belongs.

>>
>>    There doesn't seem to be any way to construct a character object for a
>>    particular code and repertoire.

Right.  There were in some of the original proposals but these were
dropped due to concerns of portability.  After a long discussion
prompted by your question, we have added new functions to construct
and decompose a character object.

>>
>>    It is not clear what the meaning of CHAR-CODE-LIMIT is now.  Does the
>>    char code include identification of the repertoire?  If so, it would
>>    seem to be of little use.  If not, then wouldn't the maximum code value
>>    be different for different repertoires?  If the code size is not able to
>>    be different for different repertoires, then I don't see how the concept
>>    of repertoires needs to be different from the old concept of font
>>    numbers.

  Our intent is that all external encodings get mapped to a single
uniform encoding within the Lisp environment.  char-code-limit is
the maximum value possible for the internal coding within a given
implementation.  Repetoires do not have an associated code; repetoires
are unordered sets of glyphs.  A coded character set assigns a unique
code to each member of a given repetoire.
  Fonts, basically, define a unique display 'style' for a given
repetoire.  The Latin repetoire, for example, has numerous display
styles (sometimes called designer glyphs).  While we could have
(perhaps) reused the font attribute this has two problems, 1) it
leaves us with a compatibility problem for implementations
which will continue to support fonts and 2) they are really
two separate concepts and should not be overloaded.

>>
>>    Does it really help to define standard keywords :CHARACTER-SET and
>>    :EXTERNAL-CODE-FORMAT for the OPEN function if there are no standard
>>    values for them?   Also, if these are not specified when opening for
>>    input, instead of specifying that the default is the base character set,
>>    should permit defaulting from what the file system knows about how the
>>    file was written.
>>
  Right.  We are stating a requirement to ISO and X3 for standard
names for repetoires and encoding names.  The default will be :DEFAULT
which is implementation defined.

∂30-Oct-88  2132	CL-Characters-mailer 	cs proposal comments   
Received: from IBM.COM by SAIL.Stanford.EDU with TCP; 30 Oct 88  21:32:10 PST
Date: Sun, 30 Oct 88 21:19:45 PST
From: Thom Linden <baggins@ibm.com>
To: "Walter Van Roggen" <vanroggen%aitg.DEC@decwrl.dec.com>
cc: "X3J13: Character Subcommittee" <cl-characters@sail.stanford.edu>
Message-ID: <881030.211945.baggins@IBM.com>
Subject: cs proposal comments

Walter,  please pass our thanks on to Ron Brender for the comments
on the proposal.  We'll be incorporating these suggestions into the
November revision.

=====================================================================
>>    (See Section  1.1,  pp3-4;  also  A.13.4,  pp27-28.)  It  might  be
>>    appropriate  to  note  in  this  introduction  that  the objectives
>>    intentionally exclude a variety of issues that often come under the
>>    title "Internationalization".  Such topics are things like date and
>>    time formats, and the like.  This would not be worth mentioning but
>>    for  the fact that char-upcase and char-downcase provide operations
>>    that should be dependent on information outside  of  the  character
>>    set as such to perform properly.  The standard example is lowercase
>>    e-acute, which  should  convert  to  uppercase  E-acute  in  French
>>    French, but to uppercase E without acute in Canadian French.  There
>>    are many other examples even within the ISO Latin-1 character set.
>>
>>
>>    In the absence of a more  general  attack  on  internationalization
>>    issues  (I  don't  recommend  such an effort on the part of LISP at
>>    this time -- wait for others  to  lead  the  way)  the  meaning  of
>>    char-upcase  and  char-downcase  for characters outside of the LISP
>>    standard  character  set  should   be   explicitly   specified   as
>>    implementation-defined.

We'll add the disclaimer to the revision.  We are also adding that
an implementation must state the effect of uppercase, etc on all
supported repetoires.

>>
>>
>>
>>    (See Section 2.1.) I support the decision to avoid any  attempt  to
>>    provide   names   for   particular   character  sets  and/or  other
>>    repertoires at this time.  Yet, it seems clear that portability  of
>>    applications  would  be  enhanced  if  there  were  an  established
>>    lexicon.  Moreover, this same issue will surely arise in  one  form
>>    or  another  in the context of every programming language that adds
>>    capability for large and/or multiple character sets.
>>
>>
>>    I suggest X3J13 send a request to, most likely,  ISO-IEC  JTC1/SC22
>>    (Programming  Languages)  recommending  issues  such  as  this that
>>    should be  addressed  across  programming  languages,  probably  in
>>    conjunction  with  SC2 (Character Sets and Controls) and SC21 (Data
>>    Bases).
X3J13 voted in November (based on this suggestion) to state such
a requirement to ISO and X3.

>>
>>
>>
>>    (See Section 2.3, p11; also A.22.3.1, p35 [A.22.3.2 is  missing?].)
>>    The  external-width  function  is  most  appropriate.  Further, the
>>    sentence observing that this function "does not address the problem
>>    of  display  width" should be emphasized more strongly, even in the
>>    absence of proportional fonts.

  We are adding that this function only applies to streams with specific
characteristics.  We've recommended that J13 address the issue of
display streams separately from this proposal.

>>
>>
>>    Further,  even  the  suggestion  that   the   format-external-width
>>    variable  is  relevant  to  producing  columnar  output  for (only)
>>    certain external code formats deserves to be stricken entirely from
>>    this  discussion.   If  the production of simple columnar output is
>>    worthwhile -- and I think it is -- then I urge  that  X3J13  search
>>    for a means to achieve this that is independent of artifacts of the
>>    external  representation.   The  suggested  approach   happens   to
>>    more-or-less   work   at  the  moment  with  many  common  external
>>    representations, but this is less unlikely to continue to  be  true
>>    in  the  future  --  in  particular,  as  the  Multiple-Octet Coded
>>    Character Set being defined by ISO-IEC JTC1/SC2/WG2 comes into  use
>>    (an ISO DP is expected out by the end of this year (1988)).
>>
>>
>>    Since assisting with columnar output seems to be the  sole  purpose
>>    of  this variable, I recommend it be withdrawn completely from this
>>    proposal.
>>
We agree that this part of the proposal is fairly limited in use.  If
it was proceeded by several implementations providing similar
(not identical) semantics, it would then be justified.  We are
dropping the part of the proposal dealing with format.  This
issue will be clearer given time and experience with implemtation
attempts at solutions. (It is clear that we will have such attempts).



∂31-Oct-88  1012	CL-Characters-mailer 	Re: cs proposal comments    
Received: from Xerox.COM by SAIL.Stanford.EDU with TCP; 31 Oct 88  10:12:14 PST
Received: from Semillon.ms by ArpaGateway.ms ; 31 OCT 88 10:08:06 PST
Date: 31 Oct 88 10:07 PST
From: masinter.pa@Xerox.COM
Subject: Re: cs proposal comments
In-reply-to: Thom Linden <baggins@ibm.com>'s message of Sat, 29 Oct 88
 19:06:12 PDT
To: Thom Linden <baggins@ibm.com>
cc: "Larry Masinter" <Masinter.pa@Xerox.COM>, "X3J13: Character
 Subcommittee" <cl-characters@sail.stanford.edu>
Message-ID: <881031-100806-5846@Xerox>

Thom --

My mail to Gray@DSG.csc.ti.com bounced too. Maybe his mail server changed names somehow.


∂31-Oct-88  1832	CL-Characters-mailer 	Re: cs proposal   
Received: from STONY-BROOK.SCRC.Symbolics.COM (SCRC-STONY-BROOK.ARPA) by SAIL.Stanford.EDU with TCP; 31 Oct 88  18:32:10 PST
Received: from EUPHRATES.SCRC.Symbolics.COM by STONY-BROOK.SCRC.Symbolics.COM via CHAOS with CHAOS-MAIL id 485003; Mon 31-Oct-88 21:31:39 EST
Date: Mon, 31 Oct 88 21:31 EST
From: David A. Moon <Moon@STONY-BROOK.SCRC.Symbolics.COM>
Subject: Re: cs proposal
To: masinter.pa@Xerox.COM
cc: cl-characters@sail.stanford.edu
In-Reply-To: <881027-110348-15668@Xerox>
Message-ID: <19881101023112.9.MOON@EUPHRATES.SCRC.Symbolics.COM>
Line-fold: No

    Date: 27 Oct 88 11:03 PDT
    From: masinter.pa@Xerox.COM

    Your point about portability is interesting, because I had not thought
    there was a portability issue between implementations that currently
    conformed to CLtL.

If there weren't such portability issues, we would hardly need a cleanup
committee so much.

    Of the two aspects, programs and data files, I think only "programs" is the
    issue for X3J13. The various standardization bodies across the world are
    trying with varying degrees of success to deal with data format
    compatibility. Envos Medley normally writes out files using the Xerox
    Character Standard encoding....

And Symbolics normally writes out files using the Symbolics character
encoding.  Don't you think that is a Tower of Babel and a problem?

I think portability of data -is- an issue for X3J13.  X3J13 should not be
creating new standards for data, but X3J13 -should- be making existing or
proposed standards for data accessible from Common Lisp.  I thought the
character committee did a fairly good job of that, although Symbolics
quarrelled with some of the particular details.  Before the character
committee's proposal, Common Lisp had no way for the programmer to
specify what external data representation to use.

    I imagine that it would be possible to write a
    program in any language which, using binary I/O, could convert those data
    files to and from whatever representation Symbolics uses for representing
    files. I imagine we could even fix the implementation to read and write
    Symbolics data files as well as Xerox data files, or Apple or JIS-coded
    Unix. 

But there is no portable way to tell the implementation to do that.

	  Frequently, the "file transfer" mechanism by which data files get
    transported from one computer system to another incorporate data format
    transformation algorithms as well, e.g., to convert between EBCDIC and
    ASCII; I imagine this mechanism could be extended to cover the
    transformation between other character encoding mechanisms, at least for
    files of "straight" text. These transformation mechanisms should be
    adequate for dealing with Common Lisp source program files as well, right?

Yes, Common Lisp source program files have the same issues as data files.
Currently, both kinds of files are only portable when they contain only
the 96 Common Lisp "standard" characters, which are the only characters that
currently have portable meaning.

Since a system can support more than one file format, relegating the issue
to the inter-system file transfer mechanism won't work.

    On the issue of portability of programs, I had assumed that programs
    written for Symbolics machines that dealt with, say, mixed Kanji and Roman
    characters, could in fact be run unchanged in Medley. I think I understand
    the Symbolics mechanism for dealing with international characters, and I
    didn't see any portability problems following current CLtL. I really
    thought the issue was only that some implementations really distingiushed
    between simple and displacable strings.

The real issue, I think, has to do with the additional primitives that are
needed to operate on extended characters.  However, since the character committee
proposal is weak to nonexistent in this area, and I'm not an expert on this
myself, I will say no more than that I think you would find that programs
written for Symbolics machines that do meaningful operations with Kanji
(as opposed to merely letting the user type in a Kanji string, and disgorging
the same characters when the string is printed, without blowing out into the
debugger) would not port to Medley, because they would call functions that
are either not defined in Medley or do not have the same names.  I've misplaced
my Symbolics Japanese documentation, so I can't give specific examples today.

    Since you "doubt that programs and data files written to explit extended
    character sets are portable..." perhaps you might be able to construct a
    small example that illustrates this point? 

Hasn't this already been discussed adequately above?  I tell you what, I'll
send you a second copy of this mail message which you will not be able
to read.

    Briefly, the Medley implementation is such that CHAR-CODE-LIMIT is 65536,
    all "extended" characters just have codes above 255. The implementation
    hides any visible distinction between strings that have 8-bits per
    character and those that have 16, i.e., if you try to store (int-char 1234)
    into a string that started out with only 8-bits per character, it quietly
    displaces the string to one with 16-bits per character. 

We could have done that too, but chose not to.  I'm not sure why, but I suspect
it was to avoid compatibility problems with machines that are unable efficiently
to expand strings when storing a fat character into a thin string.

∂31-Oct-88  1911	CL-Characters-mailer 	Re: cs proposal   
Received: from STONY-BROOK.SCRC.Symbolics.COM (SCRC-STONY-BROOK.ARPA) by SAIL.Stanford.EDU with TCP; 31 Oct 88  18:58:26 PST
Received: from EUPHRATES.SCRC.Symbolics.COM by STONY-BROOK.SCRC.Symbolics.COM via CHAOS with CHAOS-MAIL id 485033; Mon 31-Oct-88 21:57:33 EST
Date: Mon, 31 Oct 88 21:57 EST
From: David A. Moon <Moon@STONY-BROOK.SCRC.Symbolics.COM>
Subject: Re: cs proposal
To: masinter.pa@Xerox.COM
cc: cl-characters@sail.stanford.edu
In-Reply-To: <881027-110348-15668@Xerox>
Message-ID: <19881101025702.0.MOON@EUPHRATES.SCRC.Symbolics.COM>
Character-Type-Mappings: (1 0 (NIL 0) (NIL NIL :SMALLER) "TVFONT")
                         (2 0 (NIL 0) (NIL :BOLD NIL) "CPTFONTCB")
                         (3 0 (NIL 0) (NIL :ITALIC NIL) "CPTFONTI")
                         (4 0 ("MOUSE" 0) (NIL NIL NIL) "MOUSE")
                         (5 0 ("Symbol" 0) (NIL NIL NIL) "CPTFONT")
                         (6 0 (NIL 0) (NIL :BOLD-ITALIC NIL) "CPTFONTBI")
                         (7 0 (NIL 0) (NIL :BOLD :VERY-LARGE) "BIGFNTB")
Fonts: CPTFONT, TVFONT, CPTFONTCB, CPTFONTI, MOUSE, CPTFONT, CPTFONTBI, BIGFNTB
Line-fold: No

This is the promised second copy of the message, which Larry will not be
able to read, which I hope will convince him that there is an issue.
(Actually, you can get Gregor to read it for you on his Symbolics
machine.)  If anyone with an Explorer is on this list, they will
probably be able to read most, but not all, of it.  Note: our
representation for extended characters in mail resembles the
representation in files, but is not quite the same, for various
compatibility reasons.

    Date: 27 Oct 88 11:03 PDT
    From: masinter.pa@Xerox.COM

    Your point about portability is interesting, because I had not thought
    there was a portability issue between implementations that currently
    conformed to CLtL.

If there weren't such portability issues, we would hardly need a cleanup
committee so much.

    Of the two aspects, programs and data files, I think only "programs" is the
    issue for X3J13.ε1 The various standardization bodies across the world are
    trying with varying degrees of success to deal with data format
    compatibility. Envos Medley normally writes out files using the Xerox
    Character Standard encoding....


ε0And ε2Symbolicsε0 normally writes out files using the ε2Symbolicsε0 character
encoding.  Don't you think that is a Tower of Babel (βαβεε) and a problem?

I think portability of data -is- an issue for X3J13.  X3J13 should not be
creating new standards for data, but X3J13 -should- be making existing or
proposed standards for data accessible from Common Lisp.  I thought the
character committee did a fairly good job of that, although Symbolics
quarrelled with some of the particular details.  ε3Before the character
committee's proposal, Common Lisp had no way for the programmer to
specify what external data representation to use.

ε0And Lord help somebody who wants portable access to characters
like ε4εεε0 or ε5¬ε0.

    I imagine that it would be possible to write a
    program in any language which, using binary I/O, could convert those data
    files to and from whatever representation Symbolics uses for representing
    files. I imagine we could even fix the implementation to read and write
    Symbolics data files as well as Xerox data files, or Apple or JIS-coded
    Unix. 

But there is no ε6portableε0 way to ε3tellε0 the implementation to do that.

	  Frequently, the "file transfer" mechanism by which data files get
    transported from one computer system to another incorporate data format
    transformation algorithms as well, e.g., to convert between EBCDIC and
    ASCII; I imagine this mechanism could be extended to cover the
    transformation between other character encoding mechanisms, at least for
    files of "straight" text. These transformation mechanisms should be
    adequate for dealing with Common Lisp source program files as well, right?

Yes, Common Lisp source program files have the same issues as data files.
Currently, both kinds of files are only portable when they contain only
the 96 Common Lisp "standard" characters, which are the only characters that
currently have portable meaning.

Since a system can support more than one file format, relegating the issue
to the inter-system file transfer mechanism won't work.

    On the issue of portability of programs, I had assumed that programs
    written for Symbolics machines that dealt with, say, mixed Kanji and Roman
    characters, could in fact be run unchanged in Medley. I think I understand
    the Symbolics mechanism for dealing with international characters, and I
    didn't see any portability problems following current CLtL. I really
    thought the issue was only that some implementations really distingiushed
    between simple and displacable strings.

The real issue, I think, has to do with the additional primitives that are
needed to operate on extended characters.  However, since the character committee
proposal is weak to nonexistent in this area, and I'm not an expert on this
myself, I will say no more than that I think you would find that programs
written for Symbolics machines that do meaningful operations with Kanji
(as opposed to merely letting the user type in a Kanji string, and disgorging
the same characters when the string is printed, without blowing out into the
debugger) would not port to Medley, because they would call functions that
are either not defined in Medley or do not have the same names.  I've misplaced
my Symbolics Japanese documentation, so I can't give specific examples today.

    Since you "doubt that programs and data files written to explit extended
    character sets are portable..." perhaps you might be able to construct a
    small example that illustrates this point? 

ε7Hasn't this already been discussed adequately above?  I tell you what, I'll
send you a second copy of this mail message which you will not be able
to read.  That should convince you!

ε0    Briefly, the Medley implementation is such that CHAR-CODE-LIMIT is 65536,
    all "extended" characters just have codes above 255. The implementation
    hides any visible distinction between strings that have 8-bits per
    character and those that have 16, i.e., if you try to store (int-char 1234)
    into a string that started out with only 8-bits per character, it quietly
    displaces the string to one with 16-bits per character. 

We could have done that too, but chose not to.  I'm not sure why, but I suspect
it was to avoid compatibility problems with machines that are unable efficiently
to expand strings when storing a fat character into a thin string.

To finish off, here's a photograph of J. R. ``Bob'' Dobbs:

ε#ZWEI:GRAPHICS-LINE-DIAGRAM 1 T"eU~HU~GU~GU~HU~GU~GkeU~HU~GU~GU~FU~GU}:18"U}MU}:~YEj~YEj~YEj#!!Ej~YEj~YEj~Y%!!j~YE$!!i~YE"!!~YE*!!!~WEj#!!Ej#!!~GAj~Y%!!j~Y%!'}9~L~YE"!!~YE"~7~U}F.Fj#!!E}61}X=}HJk~Y%!!j31}DZWL~]E"!!~YE1}C~BWkIj#!!E:)~K~"Oiq~Y%!!j+-~GJ5!~gE"!!~Y-)!1)!~Gj#!!E&~K+!)}M!~Z%!!j'eV!)}e!F"!!~Y%~iK!!}lAj#!!E"q}B!}e}lA~Z%!!j#}a}l!}l}lBG"!!~Y!}k}l}l}l}lBj#!!E"}g}l}l}l}lB~Z%!!j#}j}l}l}l}lBF"!!~Y!}j}l}l}l~FDj#!!E"}j}l}l}l}lD~Z%!!j#}j}l}l}l},$F"!!~Y!}j}l}l}l}L$j#!!E"}j}l}l}l},D~Z%!!j#}j}l}l}l},$F"!!~Y#}j}l}l}l},Dj#!!E(}j}l}l}l~w"~Z%!!j+.}M}l@~G"F"!!~Y10}-}l0!$j#!!E~(}j~V}l}0@$~Z%!!j!`)}l"Y$F"!!~Ya~T0}i}-~*(j#!!E}.(1~C!}M'~Z%!!j}/}4S]~w}S~NE"!!~Y}-}\~/}iX}k~Nj#!!E}.}l}D}ix}D~N~Y%!!j~M}L}B}j~&}j~NE"!!~Y~K~F}b}i~F}l~nj#!!E~h}l}j}k}l}l~↑~Y%!!j~Y}l}l}k}l}l}$E"!!~Y~[}j~F}k}l}l~zj#!!E~x}j},}k}l}l~N~Y%!!j~I}f~v}l}l}l}7E"!!~Y~K}fx}l~R~FKj#!!Ej}↑z}l9`k~Y%!!j3}↑M]]=~aE"!!~YE!/!~E!Kj#!!E}6~G$~G}l!k~Y%!!j~Ya"}-0!~_E"!!~YE}5"}i!_Ej#!!Ej}E}N"~e`k~Y%!!j~Y}E}.}l}<@~YE"!!~YE}H6"}M@Fj#!!Ej~Y}X}l}`0j~Y%!!j~Y~W8#}e~V~YE"!!~YE~x(~q}l~NEj#!!EjG$}j}lhj~Y%!!j~Y~3~h~q}k~*~YE"!!~YE}6~w~q}k~JEj#!!EfsY}j}lRj~Y%!!jZF9}k}li~YE"!!~Y~-+=}l}l~_Ej#!!E~('/}l~FEj~Y%!!~pR#"}l`k~YE"!!~y@~G#}l@~ZEj#!!~kZ}-%!~UEj~Y%!!JYA*!aj~YE"!!S~7q~YEq~YEj#!!~+~7~_Ej~YEj~Y%!!j~'Ej~YEj~YE"!!~Y~'m~YEj~YEj#!!E"~YEj~YEj~Y%!!j#Fj~YEj~YE"!!~YEj~YEj~YEj#!!Ej~YEj~YEj~Y%!!U~GU~GaKU}MaIU}:SSaq`S

I apologize to anybody who was bothered by the junk mail.

∂31-Oct-88  1921	CL-Characters-mailer 	Re: cs proposal comments    
Received: from ti.com by SAIL.Stanford.EDU with TCP; 31 Oct 88  19:21:28 PST
Received: by ti.com id AA21836; Mon, 31 Oct 88 21:19:11 CST
Received: from Kelvin by tilde id AA26864; Mon, 31 Oct 88 21:12:58 CST
Message-Id: <2803346094-1934097@Kelvin>
Sender: GRAY@Kelvin.csc.ti.com
Date: Mon, 31 Oct 88  21:14:54 CST
From: David N Gray <Gray@DSG.csc.ti.com>
To: Thom Linden <baggins@ibm.com>
Cc: cl-characters@sail.stanford.edu, Bartley@mips.csc.ti.com
Subject: Re: cs proposal comments
In-Reply-To: Msg of 29 Oct 88 19:06:12 PDT from baggins@ibm.com

> >>    The semi-standard characters have been deleted without any reason given.
> 
> The fact that these are not seen as a generally useful or common
> set of characters in existing implementations will be added to the
> revision.

Given that most Common Lisp implementations are based on the ASCII
character set, I find it hard to see how it can be said that these
characters are not "common".  Even EBCDIC has codes defined for these.
Given that most terminals on the market today use ASCII control codes, I
don't see how it can be said that these are not "useful".  It is true
that programs that use these are doing I/O at a lower level than is
typical, and will not be completely portable because of device
dependencies, but that is another matter.  I'm inclined to think that
the list should be extended to include a few additional common control
characters, particularly ESCAPE.

Being able to say, for example, #\TAB is much more portable than having
to say (CODE-CHAR 9), but the current characters proposal has redefined
CODE-CHAR such that calling it with a constant argument has no portable
meaning at all.  Now if I could say something like 
(MAKE-CHARACTER 9 :ASCII) meaning give me the character whose ASCII code
is 9, that might be a reasonable alternative.

> >>    There doesn't seem to be any way to find out which repertoire a given
> >>    character object belongs to.
> 
> The problem here is a character object may be a member of several
> repetoires.  We have added a global name *all-repetoire-names*
> which is a list of all repetoires the implementation supports.  At
> a minimus it will contain :BASE and :STANDARD.  Thus it will be
> possible (using characterp) to determine the repetoire(s) to which
> an object belongs.

I think it was Robert Kerns who pointed out to me at the Fairfax meeting
that I was not using the correct terminology for what I wanted to ask.
Page 5 of the proposal says "... a character data object is identified
by its character code ... composed from a character set identifier ...
and a character set index ...".  What I had in mind then was a way to
construct a character object for a particular character set and index
(as in the example above) and to retrieve the character set and index as
separate attributes of a character object.  It seemed that CHAR-CODE
should now return the character set index of the character and there
should be something analogous to CHAR-FONT that returns something
indicating the character set, although it could be a name instead of a
number.

Actually though, what was said at the meeting sounded like the intent
was that the character code would represent an internal character set
containing the union of characters from all supported character sets
without duplication.  Consequently the concept of character set becomes
meaningful only in the context of I/O.  That is an interesting concept
but seems to contradict what is said on pages 5 and 6 of the proposal.  

> >>    It is not clear what the meaning of CHAR-CODE-LIMIT is now.  Does the
> >>    char code include identification of the repertoire?  If so, it would
> >>    seem to be of little use.  If not, then wouldn't the maximum code value
> >>    be different for different repertoires?  If the code size is not able to
> >>    be different for different repertoires, then I don't see how the concept
> >>    of repertoires needs to be different from the old concept of font
> >>    numbers.
> 
>   Our intent is that all external encodings get mapped to a single
> uniform encoding within the Lisp environment. 

Ah, so that _is_ your intent.  Now you need to make the proposal say that.

>    char-code-limit is
> the maximum value possible for the internal coding within a given
> implementation.  Repetoires do not have an associated code; repetoires
> are unordered sets of glyphs.  A coded character set assigns a unique
> code to each member of a given repetoire.

But then CHAR-CODE-LIMIT, CODE-CHAR, and CHAR-CODE have such a different
meaning from CLtL that it may be less confusing to use different names
for them.  For maximum compatibility with old usage, I would want
(CODE-CHAR x) to return the internal character whose index in the base
character set is x, and CHAR-CODE-LIMIT to represent the size of the
base character set.  Or perhaps it is expected that internal codes 0
through 255 (or whatever) would be the same as the base character set?
Maybe what you really have in mind is the Japanese code that has 0
through 255 the same as the ISO codes, which in turn has 0 through 127
the same as ASCII?  Anyway, it seems like there should be some way for
a program to ask how many characters are in the base character set.

>   Fonts, basically, define a unique display 'style' for a given
> repetoire. 

That's just one way they can be used.  CLtL mentions, for example, on
page 235 under ALPHA-CHAR-P that "whether a character is alphabetic may
depend on its font number." This seems to endorse the notion of using
different fonts for different repertoires or character sets.  The real
issue is whether you want characters to have orthogonal character set
and index fields or whether you prefer the merged internal character
coding model.

∂01-Nov-88  2326	CL-Characters-mailer 	Re: cs proposal comments    
Received: from argus.Stanford.EDU by SAIL.Stanford.EDU with TCP; 1 Nov 88  23:26:12 PST
Received: from Riverside.SCRC.Symbolics.COM (SCRC-RIVERSIDE.ARPA) by argus.Stanford.EDU with TCP; Tue, 1 Nov 88 23:22:01 PST
Received: from F.ILA.Dialnet.Symbolics.COM (FUJI.ILA.DIALNET.SYMBOLICS.COM) by Riverside.SCRC.Symbolics.COM via DIAL with SMTP id 291763; 2 Nov 88 02:23:14 EST
Date: Tue, 1 Nov 88 21:53 EST
From: Robert W. Kerns <RWK@f.ila.dialnet.symbolics.com>
Subject: Re: cs proposal comments
To: Gray@dsg.csc.ti.com, baggins@ibm.com
Cc: cl-characters@sail.stanford.edu, Bartley@mips.csc.ti.com
In-Reply-To: <2803346094-1934097@Kelvin>
Message-Id: <19881102025326.3.RWK@F.ILA.Dialnet.Symbolics.COM>

    Date: Mon, 31 Oct 88  21:14:54 CST
    From: David N Gray <Gray@DSG.csc.ti.com>
    Being able to say, for example, #\TAB is much more portable than having
    to say (CODE-CHAR 9), but the current characters proposal has redefined
    CODE-CHAR such that calling it with a constant argument has no portable
    meaning at all.  

Well, I don't know why you say that #\TAB is more portable.  At least
(CODE-CHAR 9) is extremely unlikely to blow out in any implementation at
read time! The same CANNOT be said for #\TAB !!

		     Now if I could say something like 
    (MAKE-CHARACTER 9 :ASCII) meaning give me the character whose ASCII code
    is 9, that might be a reasonable alternative.
Yes.  Much more portable than either.  (Provided, of course, that we can
standardize on character-set names!)


    > The problem here is a character object may be a member of several
    > repetoires.  We have added a global name *all-repetoire-names*
    > which is a list of all repetoires the implementation supports.  At
    > a minimus it will contain :BASE and :STANDARD.  Thus it will be
    > possible (using characterp) to determine the repetoire(s) to which
    > an object belongs.

    I think it was Robert Kerns who pointed out to me at the Fairfax meeting
    that I was not using the correct terminology for what I wanted to ask.
    Page 5 of the proposal says "... a character data object is identified
    by its character code ... composed from a character set identifier ...
    and a character set index ...".  What I had in mind then was a way to
    construct a character object for a particular character set and index
    (as in the example above) and to retrieve the character set and index as
    separate attributes of a character object.  It seemed that CHAR-CODE
    should now return the character set index of the character 

No, I think CHAR-CODE should be left alone.  It's not useful for doing
portable I/O, true, but that does not make it useless, by any means.

							       and there
    should be something analogous to CHAR-FONT that returns something
    indicating the character set, although it could be a name instead of a
    number.

No, as pointed out above, there is no "THE" character set.

Instead, you want a way to ask what index a character is in a particular
character set.  You don't want one of the many character sets involved
picked essentially at the whim of the implementation.

    Actually though, what was said at the meeting sounded like the intent
    was that the character code would represent an internal character set
    containing the union of characters from all supported character sets
    without duplication.  Consequently the concept of character set becomes
    meaningful only in the context of I/O.  
No, no, your first sentence is correct, but the second one just does not
follow.  True, most things you do with characters (and character sets)
have to do with I/O, but that's just due to the relationship between
characters and users.  Character sets could be used for things like
lexers, for example, where you use the index to look up an action
routine.

    But then CHAR-CODE-LIMIT, CODE-CHAR, and CHAR-CODE have such a different
    meaning from CLtL that it may be less confusing to use different names
    for them.  
They only way they're different is that they're more clearly defined.
Nowhere does CL make ANY guarentees about the lifetime of the number you
get back from CHAR-CODE.  Is it portable to vendor Y's machine?  Is it
portable to the next version of vendor X's system?  The next model
processor?  The next bootload?  In the absence of any explicit
guarentees, you have to be conservative.

We're not doing anything here but codifying current practice.

	       For maximum compatibility with old usage, I would want
    (CODE-CHAR x) to return the internal character whose index in the base
    character set is x, and CHAR-CODE-LIMIT to represent the size of the
    base character set.  

That minimizes compatibility with old usage.  What do you do if
character x is not in the base character set?  Certainly, what you
propose would not work on a Symbolics system, and never would have.

			 Or perhaps it is expected that internal codes 0
    through 255 (or whatever) would be the same as the base character set?

No, no expectations at all about encodings are made by this proposal.
Nor should there be any such expectations.

    Maybe what you really have in mind is the Japanese code that has 0
    through 255 the same as the ISO codes, which in turn has 0 through 127
    the same as ASCII?  
No.
			Anyway, it seems like there should be some way for
    a program to ask how many characters are in the base character set.

Yes, that is a good idea.  But that's not enough.  It should be extended
to ANY character set.  It also might be nice to know the lowest and
highest index in that character set.  (There is no guarentee of
compactness of indicies in character sets).

    >   Fonts, basically, define a unique display 'style' for a given
    > repetoire. 

    That's just one way they can be used.  CLtL mentions, for example, on
    page 235 under ALPHA-CHAR-P that "whether a character is alphabetic may
    depend on its font number." This seems to endorse the notion of using
    different fonts for different repertoires or character sets.  
Yes, CLtL was *very* confused about just what it expected fonts to mean.
								  The real
    issue is whether you want characters to have orthogonal character set
    and index fields or whether you prefer the merged internal character
    coding model.

What you're talking about here is completely an implementation issue.
If you consider the entire space of characters, assign them numbers
(i.e. you choose a "global" character set), and then partition that into
sub-character sets based on power-of-two values, you get the "character
set and index" model.  In fact, you can generate a very large number of
such models just using the characters in STANDARD-CHAR!

In other words, this isn't the real issue; it's not an issue at all.
The partitioned subfields implementation is a convenient implementation
technique, that will be widely used.  It doesn't make any difference to
the language.

We could decree that every implementation will have a particular
partitioning of the global character set, and will have some function or
other which returned values based on this partitioning.  I don't see
what good this does the user, though, except to help him make his code
non-portable.  Given the incestuous nature of the various writing
systems of the world (let alone such things as ASCII vs EBCDIC), it's
difficult to see a complete partitioning as in any way natural.

When you give up control of the partitioning of the space of characters,
to the whim of the implementors of your system, just what benefit do you
get?  Doesn't it make more sense to choose the partitioning that is of
interest to you?  Aren't you equally likely to do something like want
to separate out the ASCII characters, or separate out the ROMAN
characters?  Or the symbol consituants?  Or the ones in BASE-CHARACTER?
Why insist on an assertion that some arbitrary partitioning is THE
partitioning?

∂02-Nov-88  1100	CL-Characters-mailer 	Re: cs proposal   
Received: from Xerox.COM by SAIL.Stanford.EDU with TCP; 2 Nov 88  11:00:06 PST
Received: from Cabernet.ms by ArpaGateway.ms ; 02 NOV 88 10:52:21 PST
Date: 2 Nov 88 10:46 PST
From: masinter.pa@Xerox.COM
Subject: Re: cs proposal
In-reply-to: David A. Moon <Moon@STONY-BROOK.SCRC.Symbolics.COM>'s message
 of Mon, 31 Oct 88 21:31 EST
To: David A. Moon <Moon@STONY-BROOK.SCRC.Symbolics.COM>
cc: masinter.pa@Xerox.COM, cl-characters@sail.stanford.edu
Message-ID: <881102-105221-1350@Xerox>

David:

I'm sorry that some of my comments were unclear. There are certainly
portability issues between implementations that currently conform to CLtL.
However, I was unware of any in the area of support for international
character sets. 

On the issues of standards for character encoding, I presume that
eventually the work of ISO SC2/WG2 and ANSI X3L2 will at some point
converge, and allow us all to standardize at least on an interchange format
for extended character sets, at which time we can incorporate the
appropriate character set conversions (from Xerox<->ISO<->Symbolics.) and
that would be the appropriate way to manage the conversion. So I don't
think it is appropriate for X3J13 or SC22/WG16 to attempt to attack the
character encoding "Tower of Babel". I think we should presume that somehow
Lisp programs that are written with extended characters on one system can
get transfered to another system which might use a different encoding and
that the result is that the program on the destination system is in the
"native" encoding but that the transfer has been 1-1.

The character proposal does allow programmers to refer to the external
encoding in systems where there is more than one external encoding, but it
makes no requirements that any system support more than one encoding, and
provides no standard for what those encoding names mean. To say that a file
is ":run-coded" in one implementation says nothing about the external
encoding in another. Presumably if there were a registry of external
formats which associated the various keywords with well specified encoding
standards, the :external-format keyword would have more credibility.

I'll repeat: no extensions to CLtL are *necessary* in order to adequately
support programs that maniuplate Kanji in a meaningful way.  
The character proposal adds no such functions. The only thing the character
proposal adds are some things that are intended to improve the
performance/space requirements for supporting international character sets.

David, given the Symbolics documentation, I've not had much trouble
constructing a converter which would convert file from Symbolics format to
Xerox format. I had to look hard in the Xerox character set to find
characters that were the equivalent of some of the ones in the Mouse
character set, but for the most part, the transformation is easy. I can't
deal as simply with the ZWEI:GRAPHICS-LINE-DIAGRAM, but the issue at hand
is dealing with character sets, rather than the more general one of
intermixed text and graphics.

So I'll ask again: can you give a Common Lisp *PROGRAM* that is currently
not portable?

One of the more serious problems that the current proposal introduces has
to do with the concept of "base-character". I'll send a separate message on
that.

∂02-Nov-88  1136	CL-Characters-mailer 	What's wrong with base-character 
Received: from Xerox.COM by SAIL.Stanford.EDU with TCP; 2 Nov 88  11:34:43 PST
Received: from Salvador.ms by ArpaGateway.ms ; 02 NOV 88 11:18:49 PST
Date: 2 Nov 88 11:18 PST
From: masinter.pa@Xerox.COM
Subject: What's wrong with base-character
To: cl-characters@sail.stanford.edu
Message-ID: <881102-111849-1435@Xerox>

The notion of "base-character" in the character proposal shares all of the
negative aspects of FIXNUM that the cleanup committee has been struggling
with, while having none of the positive aspects.

Let me start with a scenario. Suppose there is a programmer of Nikko Common
Lisp in Sweden, who writes a program which manipulates strings. In this
program, he only uses base characters in his implementation. Of course, in
his implementation the character that I write here as "ao" and which I
would describe in English as "an a with a little circle on top" is a base
character. So he puts in his programs, that he wants to run fast

(declare (type base-string x y))

since all of the strings that he manipluates only contains only the
alphabetic characters of his native language, which includes the roman a-z
but also a couple of others that are used in Swedish.

Now, he sends me his program. As part of the file transfer mechanism, I
convert his Common Lisp program, which includes some constants like #\ao
(except this is really a-with-little-circle), from his character
representation to mine. Now, I'm running Lufranzki Common Lisp. In Lufranzi
Common Lisp, only ASCII are in the "base" character set, and all of the
other international characters, including a-with-little-circle, are
extended characters. Thus, #\ao, which in his implementation was a base
character, is not one in mine.

Thus, when his program passes a string "ao" to a function that contains the
legitimate declaration

(declare (type base-string x y))

the declaration is false! Lufranzi Common Lisp, as all good Common Lisp
compilers are free to do, assumes that the declarations are correct when I
have speed=3 and safety=0 set, and proceeds to perform incorrect operations
on my strings, e.g., storing 8-bit characters into a string which is
supposed to have 16-bit characters.

= = = = = = = = = = = =

The above is a long-winded example of the reason that declarations that are
part of the standard should have portable meaning. At least with FIXNUM,
there is some mathematical regularity to the space of integers to the point
where careful programmers might reasonably be expected to deal with the
subrange between MOST-NEGATIVE-FIXNUM and MOST-POSITIVE-FIXNUM. It is not
as if it might be legitimate to define FIXNUM to be, e.g., the subset of
primes! However, there is no such regularity in the world of characters and
"base" character sets. The definition of "base-character" is bankrupt, and
will lead to non-portable programs where there are none today.

I support the addition of declaration of well-defined, registered subranges
of character ranges using the (CHARACTER :STANDARD) convention, since such
declarations are portable, and *HAVE THE SAME MEANING* in all
implementations. "base-character" and "base-string" do not, and should not
be added to Common Lisp. The cost of adding them is high. The benifits are
non-existant.

Larry

∂05-Nov-88  0921	CL-Characters-mailer 	Re: cs proposal   
Received: from argus.Stanford.EDU by SAIL.Stanford.EDU with TCP; 5 Nov 88  09:20:50 PST
Received: from Riverside.SCRC.Symbolics.COM (SCRC-RIVERSIDE.ARPA) by argus.Stanford.EDU with TCP; Sat, 5 Nov 88 09:14:13 PST
Received: from F.ILA.Dialnet.Symbolics.COM (FUJI.ILA.Dialnet.Symbolics.COM) by Riverside.SCRC.Symbolics.COM via DIAL with SMTP id 292760; 5 Nov 88 12:20:08 EST
Date: Sat, 5 Nov 88 11:46 EST
From: Robert W. Kerns <RWK@f.ila.dialnet.symbolics.com>
Subject: Re: cs proposal
To: masinter.pa@xerox.com, Moon@stony-brook.scrc.symbolics.com
Cc: cl-characters@sail.stanford.edu
In-Reply-To: <881102-105221-1350@Xerox>
Message-Id: <19881105164628.4.RWK@F.ILA.Dialnet.Symbolics.COM>

    Date: 2 Nov 88 10:46 PST
    From: masinter.pa@Xerox.COM
    So I'll ask again: can you give a Common Lisp *PROGRAM* that is currently
    not portable?

Yes.

(defun no-soup (string)
  (dolist (char '(#\<hiragana-tsu> #\s #\o #\u #\p))
    (setq string (remove char string)))
  string)

There's no way to conditionalize this code or anything to allow it to be
readable in systems where it doesn't have to worry about hiragana, and
yet allow it to reference it in systems where it does support it.

This is still a hole in the proposal as it stands.  After the meeting,
some of us discussed adding a requirement that all characters be named,
including the ISO names (if registered with ISO).  This would allow the
above to be rewritten like this:


(defun no-soup (string)
  (let ((tsu (name-char "JH30")))  ;; I don't have the real table of names handy.
    (when tsu
      (setq string (remove tsu string))))
  (dolist (char '(#\s #\o #\u #\p))
    (setq string (remove char string)))
  string)

∂05-Nov-88  0925	CL-Characters-mailer 	What's wrong with base-character 
Received: from argus.Stanford.EDU by SAIL.Stanford.EDU with TCP; 5 Nov 88  09:25:28 PST
Received: from Riverside.SCRC.Symbolics.COM (SCRC-RIVERSIDE.ARPA) by argus.Stanford.EDU with TCP; Sat, 5 Nov 88 09:18:41 PST
Received: from F.ILA.Dialnet.Symbolics.COM (FUJI.ILA.Dialnet.Symbolics.COM) by Riverside.SCRC.Symbolics.COM via DIAL with SMTP id 292762; 5 Nov 88 12:24:07 EST
Date: Sat, 5 Nov 88 12:11 EST
From: Robert W. Kerns <RWK@f.ila.dialnet.symbolics.com>
Subject: What's wrong with base-character
To: masinter.pa@xerox.com, cl-characters@sail.stanford.edu
In-Reply-To: <881102-111849-1435@Xerox>
Message-Id: <19881105171158.5.RWK@F.ILA.Dialnet.Symbolics.COM>

    Date: 2 Nov 88 11:18 PST
    From: masinter.pa@Xerox.COM
						       At least with FIXNUM,
    there is some mathematical regularity to the space of integers to the point
    where careful programmers might reasonably be expected to deal with the
    subrange between MOST-NEGATIVE-FIXNUM and MOST-POSITIVE-FIXNUM. 

    I support the addition of declaration of well-defined, registered subranges
    of character ranges using the (CHARACTER :STANDARD) convention, since such
    declarations are portable, and *HAVE THE SAME MEANING* in all
    implementations. "base-character" and "base-string" do not, and should not
    be added to Common Lisp. The cost of adding them is high. The benifits are
    non-existant.

Larry, your example of problems with BASE-CHARACTER depends on
incompetent usage of these types.  If you're going to declare a variable
to be BASE-STRING, *IT IS YOUR RESPONSIBILITY* to also check the
characters that you're placing into it.  This is no different than
arrays of unsigned bytes, for example.  Mathematical regularity has
nothing to do with the issue; if you can test it, it's a useful
subrange.

Yes, BASE-CHARACTER/STRING is an oportunity for people to write
incontinent, unportable code.  That does not render it useless, or
unportable.  It *DOES* require proper description of how to use them.

I also doubt that using these types in *declarations* will be of any
value to anyone.  To my mind, that's not why those are there.

I completely fail to see how you can argue that the cost of adding these
is high.  I can't imagine an implementation spending more than half an
hour (times the usual factor of 3-5 for all time estimates!)
implementing this.

However, all that said, I will also say that given all the well-defined,
registered subranges we expect to have, and the tools for using them,
I'm less adamant about wanting these.  If we removed them from the
language, I'd end up doing something like this:

(defconstant *base-character*
	     (array-element-type (make-array 0 :element-type
					     'standard-char)))

(deftype base-character ()
  *base-character*)

(deftype base-string (&optional (length '*))
  `(array ,*base-character* ,length))

[I would hope that, given these definitions, any type system will
compile as efficient code for (typep char 'base-character) as it
would for (typep char '#.*base-character*).]

The value of *BASE-CHARACTER* will vary widely between implementations,
and to me, this freedom of type naming is just gratuitous
incompatibility that ought to be eliminated.  It's also sometimes a
convenience, and I also think it will help sell the proposal in some
quarters, such as Japan.  But since I can define it portably, I don't
think it's critical.

P.S.:  I'm not implying that you're incompetent because your example
shows incompetent usage!  You rejected an incompetent usage, and only
failed to note a possible compitent usage.  Quite another matter!  ;=)

I left my spellos for you to critisize my competency, if you want.

∂05-Nov-88  0926	CL-Characters-mailer 	Re: cs proposal   
Received: from argus.Stanford.EDU by SAIL.Stanford.EDU with TCP; 5 Nov 88  09:26:04 PST
Received: from Riverside.SCRC.Symbolics.COM (SCRC-RIVERSIDE.ARPA) by argus.Stanford.EDU with TCP; Sat, 5 Nov 88 09:19:25 PST
Received: from F.ILA.Dialnet.Symbolics.COM (FUJI.ILA.Dialnet.Symbolics.COM) by Riverside.SCRC.Symbolics.COM via DIAL with SMTP id 292763; 5 Nov 88 12:25:14 EST
Date: Sat, 5 Nov 88 12:15 EST
From: Robert W. Kerns <RWK@f.ila.dialnet.symbolics.com>
Subject: Re: cs proposal
To: masinter.pa@xerox.com, Moon@stony-brook.scrc.symbolics.com
Cc: cl-characters@sail.stanford.edu
In-Reply-To: <881102-105221-1350@Xerox>
Message-Id: <19881105171528.7.RWK@F.ILA.Dialnet.Symbolics.COM>

    Date: 2 Nov 88 10:46 PST
    From: masinter.pa@Xerox.COM
    The character proposal does allow programmers to refer to the external
    encoding in systems where there is more than one external encoding, but it
    makes no requirements that any system support more than one encoding, and
    provides no standard for what those encoding names mean. To say that a file
    is ":run-coded" in one implementation says nothing about the external
    encoding in another. Presumably if there were a registry of external
    formats which associated the various keywords with well specified encoding
    standards, the :external-format keyword would have more credibility.

I certainly think a registry is required.  Isn't this one of the namings
we were asking the ISO subcommittees for?

But even a local registry (i.e. a variable listing the possible
encodings) is useful.  You can always give the user his choice.
That's portable.  Not necessarily ideal, but sometimes it's even the
right thing.

∂07-Nov-88  1826	CL-Characters-mailer 	Re: cs proposal   
Received: from STONY-BROOK.SCRC.Symbolics.COM (SCRC-STONY-BROOK.ARPA) by SAIL.Stanford.EDU with TCP; 7 Nov 88  18:26:12 PST
Received: from EUPHRATES.SCRC.Symbolics.COM by STONY-BROOK.SCRC.Symbolics.COM via CHAOS with CHAOS-MAIL id 488732; Mon 7-Nov-88 21:25:36 EST
Date: Mon, 7 Nov 88 21:25 EST
From: David A. Moon <Moon@STONY-BROOK.SCRC.Symbolics.COM>
Subject: Re: cs proposal
To: masinter.pa@Xerox.COM
cc: cl-characters@sail.stanford.edu
In-Reply-To: <881102-105221-1350@Xerox>
Message-ID: <19881108022528.8.MOON@EUPHRATES.SCRC.Symbolics.COM>
Line-fold: No

    Date: 2 Nov 88 10:46 PST
    From: masinter.pa@Xerox.COM

    The character proposal does allow programmers to refer to the external
    encoding in systems where there is more than one external encoding, but it
    makes no requirements that any system support more than one encoding, and
    provides no standard for what those encoding names mean. 

I agree with RWK's comments here, to wit, rather than remove the ability to
name external encodings we should standardize the names.

    I'll repeat: no extensions to CLtL are *necessary* in order to adequately
    support programs that maniuplate Kanji in a meaningful way.  
    The character proposal adds no such functions. The only thing the character
    proposal adds are some things that are intended to improve the
    performance/space requirements for supporting international character sets.

I personally think it's a defect of the character proposal that it
doesn't add such functions, nor clarify the meaning of existing
functions (e.g. ALPHA-CHAR-P) for non-Latin character sets.  Symbolics'
comments on the character proposal didn't comment on this issue, but
I think this was the gist of ILA's comments.  I'll shut up about this
issue and leave it in the hands of Bob Kerns and Mark Son-Bell.

    One of the more serious problems that the current proposal introduces has
    to do with the concept of "base-character". I'll send a separate message on
    that.

I eagerly await with trepidation.

∂08-Nov-88  1733	CL-Characters-mailer 	Re: cs proposal   
Received: from Xerox.COM by SAIL.Stanford.EDU with TCP; 8 Nov 88  17:33:37 PST
Received: from Semillon.ms by ArpaGateway.ms ; 08 NOV 88 17:07:14 PST
Date: 8 Nov 88 17:07 PST
From: masinter.pa@Xerox.COM
Subject: Re: cs proposal
In-reply-to: Robert W. Kerns <RWK@F.ILA.Dialnet.Symbolics.COM>'s message of
 Sat, 5 Nov 88 11:46 EST
To: Robert W. Kerns <RWK@F.ILA.Dialnet.Symbolics.COM>
cc: masinter.pa@Xerox.COM, Moon@STONY-BROOK.SCRC.Symbolics.COM,
 cl-characters@sail.stanford.edu
Message-ID: <881108-170715-1216@Xerox>

Your example 

(defun no-soup (string)
  (dolist (char '(#\<hiragana-tsu> #\s #\o #\u #\p))
    (setq string (remove char string)))
  string)


is no more or less portable than

(defun no-tab (string)
  (dolist (char '(#\	 #\t #\a #\b))
    (setq string (remove char string)))
  string)


(I believe that the Arpanet mailer will correctly transmit that as dolist
(char '(#\<tab> #\t #\a #\b)), but I included this sentence just in case it
didn't.)

This certainly points to an issue that affects portability of programs and
has something to do with character encoding, namely, how do programs which
use non-standard characters in the source text get ported to systems which
do not support the same non-standard characters.  However, my original
claim stands: there is nothing in the  September 9, 1988 document* that
improves portability of Common Lisp programs. 

Certainly I can write and run the no-soup program in Medley. It looks like:

(defun no-soup (string)
  (dolist (char '(#\$C #\s #\o #\u #\p))
    (setq string (remove char string)))
  string)


So I think that in Common Lisp systems that do support International
Character sets that there are no changes required to the language to make
systems port reasonably between one such system and another. There is still
this other problem having to deal with non-overlapping character sets, but
the 9 Sept 88 proposal doesn't address them.


* Reference:  "DRAFT: Extensions to Common LISP to Support International
Character Sets", Beckerle, Beiser, Kerns, Layer, Linden, Masinter, Sept 9,
1988. 

∂09-Nov-88  1410	CL-Characters-mailer 	Re: What's wrong with base-character  
Received: from Xerox.COM by SAIL.Stanford.EDU with TCP; 9 Nov 88  14:10:36 PST
Received: from Semillon.ms by ArpaGateway.ms ; 09 NOV 88 13:56:10 PST
Date: 9 Nov 88 13:55 PST
From: masinter.pa@Xerox.COM
Subject: Re: What's wrong with base-character
In-reply-to: Robert W. Kerns <RWK@F.ILA.Dialnet.Symbolics.COM>'s message of
 Sat, 5 Nov 88 12:11 EST
To: Robert W. Kerns <RWK@F.ILA.Dialnet.Symbolics.COM>
cc: masinter.pa@Xerox.COM, cl-characters@sail.stanford.edu
Message-ID: <881109-135610-2431@Xerox>

You say that my example "depends on incompetent usage of the types". But
the usage is the example a correct, appropriate, legitimate usage in the
implementation in which it starts out.

The original programmer did all of the things you said: before he declared
his variable to be BASE-STRING, he checked that the characters he was
placing into it. Its only that, in his implementation, he mistakenly
assumed that BASE-STRING meant what his documentation said that BASE-STRING
meant, namely, a string which only had characters that were BASE-CHARACTERS
in the implementation he was working on. However, the fine print is that
you can't say BASE-CHARACTER when you mean (CHARACTER
:FOOBAR-COMMON-LISP-BASE-CHARACTER), unless you write code that tests for
BASE-ness at every step of the way.

This is similar to the problem with FIXNUM -- people write (DECLARE (TYPE
FIXNUM X Y Z)) in implementation A, and then port their program to
implementation B, and find that it no longer works because implementation
B's FIXNUM range is smaller than implementation A's. This is a serious
problem with FIXNUM, and even more serious with BASE-CHARACTER.

However, you say  "I also doubt that using these types in *declarations*
will be of any value to anyone.  To my mind, that's not why those are
there."  If they're not used in declarations, how are they used? I have
much more trouble imagining using BASE-CHARACTER and BASE-STRING except in
declarations.

You say "I completely fail to see how you can argue that the cost of adding
these is high. "

Sorry. When I said "cost" I didn't mean "cost to implementor".  Certainly
it is easy to add the type "base-character" to an implementation. I mean
"costs to users", and actually "costs to users trying to port other
people's code". 


∂09-Nov-88  1927	CL-Characters-mailer 	Re: What's wrong with base-character  
Received: from argus.Stanford.EDU by SAIL.Stanford.EDU with TCP; 9 Nov 88  19:27:18 PST
Received: from Riverside.SCRC.Symbolics.COM (SCRC-RIVERSIDE.ARPA) by argus.Stanford.EDU with TCP; Wed, 9 Nov 88 19:20:36 PST
Received: from F.ILA.Dialnet.Symbolics.COM (FUJI.ILA.Dialnet.Symbolics.COM) by Riverside.SCRC.Symbolics.COM via DIAL with SMTP id 293546; 9 Nov 88 22:24:45 EST
Date: Wed, 9 Nov 88 22:16 EST
From: Robert W. Kerns <RWK@f.ila.dialnet.symbolics.com>
Subject: Re: What's wrong with base-character
To: masinter.pa@xerox.com, RWK@f.ila.dialnet.symbolics.com
Cc: cl-characters@sail.stanford.edu
In-Reply-To: <881109-135610-2431@Xerox>
Message-Id: <19881110031608.2.RWK@F.ILA.Dialnet.Symbolics.COM>

    Date: 9 Nov 88 13:55 PST
    From: masinter.pa@Xerox.COM

    You say that my example "depends on incompetent usage of the types". But
    the usage is the example a correct, appropriate, legitimate usage in the
    implementation in which it starts out.

Well, the statement is a little strong.  What I meant is that it is not
an appropriate, legitimate usage for portable code, which is what we're
talking about here.

It seems that you're compounding things now by assuming both that the
programmer doesn't know what he's doing, but that the person who wrote
the documentation also didn't do an adaquate job.

Since what you're trying to claim is that it ISN'T useful, you need a
different kind of argument.  You need to show that someone who DOES know
what the type means, and who has READ the spec and UNDERSTOOD it, cannot
use it portable.  I believe we've already disproved this.

Perhaps you want to be arguing that it's hard to understand, or hard to
document.  I don't agree, but I think it would be a more sound basis for
discussion.

    The original programmer did all of the things you said: before he declared
    his variable to be BASE-STRING, he checked that the characters he was
    placing into it. Its only that, in his implementation, he mistakenly
    assumed that BASE-STRING meant what his documentation said that BASE-STRING
    meant, namely, a string which only had characters that were BASE-CHARACTERS
    in the implementation he was working on. 

No, that's not what I said.  I meant, he checked *IN THE CODE*.  If
you're writing portable code, a competent programmer won't check one
implementation's documentation for a result!

					     However, the fine print is that
    you can't say BASE-CHARACTER when you mean (CHARACTER
    :FOOBAR-COMMON-LISP-BASE-CHARACTER), unless you write code that tests for
    BASE-ness at every step of the way.

No, unlike FIXNUM's, you don't have to check every step of the way.  You
can even check just on primary input to your program, and set a single
global flag indicating that you have EVER seen a
(AND CHARACTER (NOT BASE-CHARACTER)), and act accordingly.  You may only
need to check this flag in one place: the routine which allocates your
string buffers.

That technique would require a lot of painstaking deduction and
mathematical analysis to do for the type FIXNUM, just to determine what
you have to constrain your inputs to.  Experience shows it is often
wrong, because the operators on numbers have much hairier consequences
from a type reasoning point of view.

    This is similar to the problem with FIXNUM -- people write (DECLARE (TYPE
    FIXNUM X Y Z)) in implementation A, and then port their program to
    implementation B, and find that it no longer works because implementation
    B's FIXNUM range is smaller than implementation A's. This is a serious
    problem with FIXNUM, and even more serious with BASE-CHARACTER.

No, it's MUCH less of a problem, because you don't do calculations with
characters.  If you start with just BASE-CHARACTER's, the set is closed;
there are no operations in CL which yield CHARACTER's which are not
BASE-CHARACTERS, when given only BASE-CHARACTER's.  That is not true of fixna.

    However, you say  "I also doubt that using these types in *declarations*
    will be of any value to anyone.  To my mind, that's not why those are
    there."  If they're not used in declarations, how are they used? I have
    much more trouble imagining using BASE-CHARACTER and BASE-STRING except in
    declarations.

If you're just going to use them to do declarations, I'd agree with you
about leaving them out.  But if I'm writing code which is building a
database of text, and I have to create huge quantities of this
information, and in many implementations, BASE-STRING is up to a factor
of four better in space and paging, I will want to write my code to
check if I can store the data more efficiently.  Also, if I have a
BASE-STRING, and someone gives me a character, I want to know:

a)  That I have a string which is basic, so I must check & find out
b)  that I have a character that will fit.

If I have one that won't fit, I may have to allocate a new string.

Now, I could use (TYPEP CHARACTER (ARRAY-ELEMENT-TYPE STRING)), but
unoptimized TYPEP's are very slow in most implementations.  BASE-STRING
and BASE-CHARACTER allow me to write code which is portably aware of the
dichotomy in storage efficiency of strings, and to use that to write
code which is more efficient while still fully portable.

[Yes, I know it's nice to have the system do it for you.]

Now, and implementation may have more string types available, and to
make portable use of those, I have to work harder.  That's OK.  But the
basic/non-basic dichotomy will be very common.

Now, as I said in my last message, there are ways of portably
programming around this.  I can actually implement the types myself
portably, for any number of types involved.  But my claim is (and I
think this is really the core issue to discuss) that more people will
fail to write portable code because they can't figure out how to
portably define the types involved.

The programmer you posit will declare everything BASE-CHARACTER and thus
write non-portable code, won't give up his goal just because you take
away his nice type.  You'll have to take away his keyboard, too, because
he's going to find another way to say the same wrong thing.  I think
we're better off giving people the type, and then telling them how to
use it portably, and how not to.

    You say "I completely fail to see how you can argue that the cost of adding
    these is high. "

    Sorry. When I said "cost" I didn't mean "cost to implementor".  Certainly
    it is easy to add the type "base-character" to an implementation. I mean
    "costs to users", and actually "costs to users trying to port other
    people's code". 

Well, foo, I would be happy to write them an EMACS macro that takes out
all the BASE-CHARACTER and BASE-STRING declarations.  I'm still not convinced,
now that I know what you meant.  I doubt their programs would run much
slower, anyway.

∂09-Nov-88  2025	CL-Characters-mailer 	Re: cs proposal   
Received: from argus.Stanford.EDU by SAIL.Stanford.EDU with TCP; 9 Nov 88  20:25:10 PST
Received: from Riverside.SCRC.Symbolics.COM (SCRC-RIVERSIDE.ARPA) by argus.Stanford.EDU with TCP; Wed, 9 Nov 88 20:18:25 PST
Received: from F.ILA.Dialnet.Symbolics.COM (FUJI.ILA.Dialnet.Symbolics.COM) by Riverside.SCRC.Symbolics.COM via DIAL with SMTP id 293554; 9 Nov 88 23:22:41 EST
Date: Wed, 9 Nov 88 23:14 EST
From: Robert W. Kerns <RWK@f.ila.dialnet.symbolics.com>
Subject: Re: cs proposal
To: masinter.pa@xerox.com, RWK@f.ila.dialnet.symbolics.com
Cc: Moon@stony-brook.scrc.symbolics.com, cl-characters@sail.stanford.edu
In-Reply-To: <881108-170715-1216@Xerox>
Message-Id: <19881110041406.9.RWK@F.ILA.Dialnet.Symbolics.COM>

    Date: 8 Nov 88 17:07 PST
    From: masinter.pa@Xerox.COM

    Your example 
      (dolist (char '(#\<hiragana-tsu> #\s #\o #\u #\p))
    is no more or less portable than
      (dolist (char '(#\	 #\t #\a #\b))
Agreed; the problem is not new, just magnified.

I would rather discuss how to solve this than argue about whether or not
I am able to use BASE-CHARACTER portably.

My opinion about how to solve this is that a fully-portable program
includes no non-STANDARD-CHAR characters in the source, except maybe in
comments, and that any references to non-STANDARD-CHAR's are done by
naming those characters in terms of printing STANDARD-CHARACTER's.  Note
that this is how we already handle the non-printing characters, which
are required to have names, like #\Tab.

It's not enough for them to have names, of course; we have to
standardize on the names.  Fortunately, the ISO names are portable and
standardized.  Unfortunately they're ugly and cryptic.  Fortunately,
they're better than #\$C.  (I guess, if you squint and use a mirror, you
can see a resemblance between the C and the tsu.  At least, if your
office mate has been smoking and the pollen count is high.)

I guess we could consider ways to allow users to define their own names
for characters, and require the ISO names be present.  We have to watch
out that we don't lead to portable applications stepping on each other's
character names.

    This certainly points to an issue that affects portability of programs and
    has something to do with character encoding, namely, how do programs which
    use non-standard characters in the source text get ported to systems which
    do not support the same non-standard characters.  
I don't think this is feasible to solve, which is why I suggest fully
portable programs don't have any but STANDARD-CHAR.  Even in comments,
they leave themselves open to the capabilities of the translating
process, which may not always be capable of distinguishing an #o015 byte
of a two-byte code from a newline in the middle of your comment.

						      However, my original
    claim stands: there is nothing in the  September 9, 1988 document* that
    improves portability of Common Lisp programs. 

I don't agree with this claim.  I think the arguments to OPEN do improve
the portability, and I think that if implementations have multiple
string types (which many already do), that BASE-CHARACTER and
BASE-STRING make taking advantage of this more convenient, when used
properly.

But I don't think that was really your original claim.  I think you
originally claimed that nothing it did was REQUIRED to be able to write
portable programs.

    Certainly I can write and run the no-soup program in Medley. It looks like:

    (defun no-soup (string)
      (dolist (char '(#\$C #\s #\o #\u #\p))
	(setq string (remove char string)))
      string)


    So I think that in Common Lisp systems that do support International
    Character sets that there are no changes required to the language to make
    systems port reasonably between one such system and another. 

Well, I don't see how your NO-SOUP example advances this claim; it looks
to me like a counter example.  You have this private kludge to allow
entry of tsu, and I have mine, and they're not the same.

Or perhaps I'm getting faked out by the mail system, and you forgot to
tell me that the $C I'm seeing is really supposed to be a tsu?  Given
file translation (which is anything but given, as this shows!), you're
right, but then neither of us is portable to any non-Japanese-supporting
systems.

								 There is still
    this other problem having to deal with non-overlapping character sets, but
    the 9 Sept 88 proposal doesn't address them.

I'm afraid this phrase is bound to more than one concept in my mind.
Can you clarify which problem you want to discuss?

    * Reference:  "DRAFT: Extensions to Common LISP to Support International
    Character Sets", Beckerle, Beiser, Kerns, Layer, Linden, Masinter, Sept 9,
    1988. 


∂09-Nov-88  2029	CL-Characters-mailer 	Re: cs proposal   
Received: from argus.Stanford.EDU by SAIL.Stanford.EDU with TCP; 9 Nov 88  20:29:40 PST
Received: from Riverside.SCRC.Symbolics.COM (SCRC-RIVERSIDE.ARPA) by argus.Stanford.EDU with TCP; Wed, 9 Nov 88 20:22:42 PST
Received: from F.ILA.Dialnet.Symbolics.COM (FUJI.ILA.DIALNET.SYMBOLICS.COM) by Riverside.SCRC.Symbolics.COM via DIAL with SMTP id 293558; 9 Nov 88 23:28:19 EST
Date: Wed, 9 Nov 88 23:17 EST
From: Robert W. Kerns <RWK@f.ila.dialnet.symbolics.com>
Subject: Re: cs proposal
To: RWK@f.ila.dialnet.symbolics.com, masinter.pa@xerox.com
Cc: Moon@stony-brook.scrc.symbolics.com, cl-characters@sail.stanford.edu
In-Reply-To: <19881110041406.9.RWK@F.ILA.Dialnet.Symbolics.COM>
Message-Id: <19881110041752.0.RWK@F.ILA.Dialnet.Symbolics.COM>

    Date: Wed, 9 Nov 88 23:14 EST
    From: Robert W. Kerns <RWK@F.ILA.Dialnet.Symbolics.COM>
			       (I guess, if you squint and use a mirror, you
    can see a resemblance between the C and the tsu.  At least, if your
    office mate has been smoking and the pollen count is high.)

In fairness to my various coworkers, I should point out that none of
them smoke.  Nor do any of them emit pollen.  However, a cold virus
lately has been giving me the same effects.

∂12-Nov-88  1801	CL-Characters-mailer 	CS Proposal comments on DRAFT: Exten...    
Received: from IBM.COM by SAIL.Stanford.EDU with TCP; 12 Nov 88  18:01:30 PST
Date: Sat, 12 Nov 88 17:41:32 PST
From: Thom Linden <baggins@ibm.com>
To: Larry Masinter <masinter.pa@xerox.com>
cc: "X3J13: Character Subcommittee" <cl-characters@sail.stanford.edu>
Message-ID: <881112.174132.baggins@IBM.com>
Subject: CS Proposal comments on DRAFT: Exten...

Larry,
  We did decide at Wash DC to place schar into the 'compatibility'
section due to the ambiguous reference to the string type.  While
this type of optimization was agreed as being "odd" it seemed to
be one commonly needed as declarations may be ignored.  So, we
actually introduced two replacements: sbchar and sgchar.
sbchar applies to simple base-strings whild sgchar applies to
simple general-strings.

>>    In Xerox Common Lisp / Medley, SCHAR is slower interpreted, since it
>>    actually checks that its argument is a string. The compiled optimizer
>>    generates the same code as AREF.
>>
>>    Frankly, I think SCHAR is an odd beast -- most other declarations and type
>>    annotations in the language are done with "the" and "declare".
>>
>>    Maybe it would do as well to do away with SCHAR.  (A purist would eliminate
>>    them all and say "use ELT",  but that's probably going too far.)
>>
>>    My general point is that some of the optimizations that made sense at the
>>    time CLtL was written no longer do, and we might be able to simplify the
>>    language rather than make it more complex.

∂12-Nov-88  1815	CL-Characters-mailer 	CS Proposal comments. DRAFT: Exten... 
Received: from IBM.COM by SAIL.Stanford.EDU with TCP; 12 Nov 88  18:15:11 PST
Date: Sat, 12 Nov 88 17:58:54 PST
From: Thom Linden <baggins@ibm.com>
To: David Unietis <dru@lucid.com>
cc: "X3J13: Character Subcommittee" <cl-characters@sail.stanford.edu>
Message-ID: <881112.175854.baggins@IBM.com>
Subject: CS Proposal comments. DRAFT: Exten...

David,
  Thanks for the comments.  Also, thanks for joining our discussions
at Wash. DC.

---------------------------------------------------------------

>>    I received the latest draft of the character set proposal, and it seems
>>    to adequately cover most of the issues raised by my earlier comments.  The
>>    issue I brought up concerning the type definition of most-general-string
>>    was entirely my fault - I misread the type definition of string in the latest
>>    draft.  Defining the string type as a disjunction of other types solves
>>    the problem satisfactorily.
>>
>>    I have a few remaining comments on the issues below:
>>
>>
>>    * Simple-strings and SCHAR
>>
>>    We have no direct user experience to report here, but rather are basing our
>>    opinion on the original JEIDA proposal as well as discussions with IBM Japan
>>    and CSK, all of whom strongly desire compatible string access.
>>    Furthermore, we've done some measurements of our prototype Kanji
>>    implementations that treat SCHAR in this manner, and they indicate that the
>>    performance impact is fairly small.  Of course, this experience is only
>>    relevant to general-purpose architectures, it may be more difficult and/or
>>    expensive to re-implement SCHAR this way on microcoded Lisp machines - I
>>    wonder how much influence this contingent has had on the discussion...
>>
>>
>>    * Equivalence classes
>>
>>    To me, it seems unrealistic to expect ISO to standardize on a non-overlapping
>>    character set, when all existing Kanji character sets (at least, all I know
>>    about) contain a 'double-byte' version of either ASCII or EBCDIC embedded in
>>    them.
>>

  If fact, you have listed what appears to be the driving force for
this feature: that many systems in Asia support both a single byte
encoding and a multi-byte encoding for the Latin characters and
their user communities recognize distinctions between the two (I believe
primarily a visual distinction).
  We have requested JIS to submit a proposal on equivalency which would
satisfy their requirements and be added to this proposal.


>>
>>    * JEIDA
>>
>>    I'm concerned that their input may be arriving too late, especially if adopting
>>    their recommendations would result in substantial revisions.  The message you
>>    forwarded from Professor Ito suggests that they do have significant comments.
>>    At very least, I feel we need to set aside part of the Monday meeting to a
>>    review of their suggestions.  If it is possible for you to get the meeting
>>    attendees a copy in advance, it would be helpful.
>>

 I believe JIS WG16 Lisp committee (JEIDA created an early document
but is not the standards body in Japan) has indicated they hope to
introduce an initial document at the November WG16 meeting.  I'll
forward a copy to the Characters subcommittee after I receive a copy.

>>
>>    Overall, the proposal is looking quite good.
>>
>>

∂12-Nov-88  1842	CL-Characters-mailer 	CS Proposal comments. DRAFT: Exten... 
Received: from IBM.COM by SAIL.Stanford.EDU with TCP; 12 Nov 88  18:41:52 PST
Date: Sat, 12 Nov 88 18:27:34 PST
From: Thom Linden <baggins@ibm.com>
To: Mike Beckerle and Robert Krajewski <rpk@wheaties.ai.mit.edu>
cc: "X3J13: Character Subcommittee" <cl-characters@sail.stanford.edu>
Message-ID: <881112.182734.baggins@IBM.com>
Subject: CS Proposal comments. DRAFT: Exten...

Thanks for the comments.  Discussions to clarify the document
will be incorporated in the next revision.

-------------------------------------------------------------------

>>    Mike Beckerle and I have looked over this proposal, and are pretty
>>    satisfied with it.  Most of our concerns stem from ambiguities that
>>    arise because the text does not mention specific objects or cases.
>>
>>    1. The draft says that font and bits information does not affect the
>>    identity of a character.   Identity with respect to what ?
>>
>>        CHAR=        CHAR-INT <=> INT-CHAR
>>        EQL
>>        EQ
>>
>>    The editorial changes imply that CHAR= will still distinguish between
>>    characters with different attributes, which includes
>>    implementation-defined attributes.  All this needs to be clarified.

I believe the later version (you were reviewing DRAFT DRAFT as I recall)
clears some of this up.  Also, (ref. Symbolic's comments) an
implemenation will need to document the effect, if any, that
attributes have on char-equal.

>>
>>    For implementations that are going to support bits in a Common Lisp
>>    that adopts something like the current proposal, this is an important
>>    issue.  A few words ought to be said about support of the old font and
>>    bit ideas in the ``new world,'' if an implementation decides to offer
>>    support.  Does it make sense to implement such characters as an
>>    extended repertoire that is a superset of base repertoire ?

We'll add some discussion on this to the revised document.  The
extended repertoire you mention is one possibility, another is
an extended type hierarchy (eg. input-gesture).

>>
>>    2. How does EXTERNAL-WIDTH behave with encoding schemes that employ
>>    formatting information when switching between representations ?  I am
>>    not familiar with such schemes, but I assume that a
>>    multi-character-set stream can have more than one state (where each
>>    state corresponds to one of the supported character sets), and a
>>    different number of octets/quanta/whatever have to be written out for
>>    the same character depending on the state (because the encoding scheme
>>    requires some control commands).  In other words, can EXTERNAL-WIDTH
>>    be passed identical stream and character arguments, and possibly
>>    return different values if there has been some intervening output on
>>    the stream ?
>>

  This is a good point.  For example, with a run-encoded external
representation (Symbolics, Xerox, etc) these state transitions
may be pronounced with several bytes (quanta) of information
written out to identify the new state.
  The revised document will indicate that the external-width
result number corresponds to the current state of the stream
and may change if there has been intervening output.


∂14-Nov-88  1526	CL-Characters-mailer 	CS Proposal comments. DRAFT: Exten... 
Received: from IBM.COM ([192.5.58.7]) by SAIL.Stanford.EDU with TCP; 14 Nov 88  15:25:28 PST
Date: Mon, 14 Nov 88 13:37:02 PST
From: Thom Linden <baggins@ibm.com>
To: "David A. Moon" <moon@scrc-stony-brook.arpa>
cc: "X3J13: Character Subcommittee" <cl-characters@sail.stanford.edu>
Message-ID: <881114.133702.baggins@IBM.com>
Subject: CS Proposal comments. DRAFT: Exten...

David,
  Thanks for the very thorough reading of the proposal.  The
improvements mentioned in your comments will be incorporated in the
revised document.

---------------------------------------------------------------------
>>    OVERALL COMMENT
>>
>>    In general we agree with this proposal, but there are some defects
>>    in it that need to be remedied before it can be acceptable.  The
>>    proposal is really not ready yet for voting.
>>
>>
>>    MAJOR COMMENTS
>>
>>    * Pages 6 and 18 call for the meaning of the STRING-CHAR type specifier
>>    to be incompatibly changed in the name of compatibility.  We oppose this.
>>    Compatibility would be much easier to achieve by eliminating STRING-CHAR
>>    from the language, allowing a user or an implementation to define it
>>    with DEFTYPE to be whatever they require for compatibility.  (This would
>>    leave (DECLARE (STRING-CHAR x)) undefined, unless an implementation added
>>    it, since there is no way for a user to add declarations.)

The definition of string-char will be changed to implementation
defined.  We are considering moving all the compatibility sections
to a single appendix.  The appendix would likely be an advisory
part of the standard (ie. not part of the standard language).

>>
>>    * Page 11 says that (write-char #\newline stream) is no longer equivalent
>>    to (terpri stream).  This directly contradicts the last paragraph of CLtL
>>    p.22, which this proposal does not amend.  We can see no justification for
>>    this incompatible change; outputting a newline character should remain
>>    equivalent to calling the terpri function.  The fact that many external
>>    character encoding schemes treat newline as a special case applies equally
>>    to the newline character and the terpri function and does not justify
>>    changing them to be non-equivalent.

This will be changed.  We will add the comment that newline must
be valid for any external-format.

>>
>>    * Pages 11 and 34-5:  The EXTERNAL-WIDTH function and FORMAT features are
>>    much less well thought-out than the rest of the proposal, are described in
>>    a self-contradictory way, and are unrelated to the main topic of this
>>    proposal.  They should be removed, and proposed separately when they have
>>    been more carefully thought out.  We could offer more detailed criticisms,
>>    but that doesn't seem useful at this time.  By the way, the Cleanup
>>    committee issue STREAM-INFO appears to cover the same ground.

We will remove the *format-external-width* variable from the
proposal.  We agreed its scope was too narrow and considerations of
apa displays and printers must be taken into account.  We'll check
the cleanup proposal for overlap on the external-width function
but we think such a function is needed.

>>
>>    * Page 21 uses a type-specifier list (character :standard) in an example
>>    but there is no definition of what this means nor what the valid syntax is.

Right.  This will be added.  Valid syntax is a single repertoire
keyword (such as :standard) or a list of repertoire keywords.  The
meaning is a character object which can hold any member of the
noted repertoires.

>>
>>    * Pages 6, 23, and 25 mandate that CHAR-EQUAL is unaffected by all
>>    implementation-defined character attributes.  This is not an acceptable
>>    generalization; the effect, if any, on CHAR-EQUAL of each
>>    implementation-defined character attribute has to be specified as part of
>>    the definition of that attribute.  Symbolics Genera, for example, has one
>>    implementation-defined character attribute that definitely should affect
>>    CHAR-EQUAL and another that definitely should not.

This change will be made in the revision.


>>
>>
>>    MINOR COMMENTS (not so minor that they can be ignored!)
>>
>>    The introduction makes no mention of extended typesetting symbols, such as
>>    accent marks and the copyright and trademark symbols.  If Lisp is to be
>>    used for real-world applications, these are necessary.

It doesn't mention others as well, eg. scientific symbols.  We'll
add a comment to the objectives on p4 that proposal applies to
these 'languages' as well.

>>
>>    Page 10 refers to the representation of coded character sets as keyword
>>    symbols.  Why not use CLOS objects?  There might be reasons, but you should
>>    state them.

We feel that if CLOS is to be ingrained in every CL application
this might be appropriate.  But that is itself quite arguable.
In any case, the CLOSification of CL is not a topic of this
proposal and should not be done piecemeal.

>>                 Also there should be a portable way to refer to the base
>>    character set.  In general the language representation of character sets
>>    and of character repertoires is very poorly specified and the proposal
>>    needs to be extended to cover this.

We are adding :base as a repertoire name.  In general, the repertoire
names should be ISO standardized names and we are making this
requirement know to the appropriate X3 committee.

>>
>>    Pages 11, 36, 37: There are several problems with OPEN options:
>>
>>     The default value of the :EXTERNAL-CODE-FORMAT argument to OPEN should be
>>     implementation-defined rather than required to be the "natural" encoding
>>     (whatever that is).  The only requirement should be that it be able to
>>     encode the base character set.  It should not be restricted from encoding
>>     other character sets also.  There should be a name for this default value,
>>     probably :DEFAULT.

:default will be added.


>>
>>     There should be a name for the "natural" encoding and there should be a
>>     specification of the properties of the natural encoding that a programmer
>>     can rely on.  Suggestions for the name include :BASE, :NATURAL, and
>>     :INTERCHANGE.  The definition probably involves the concept of data
>>     interchange with non-Lisp programs on the same system.

This will be added to the revision.

>>
>>     There should be names for standard encodings such as ASCII to allow
>>     data interchange between differing systems.

Yes.  We are forwarding a requirement for standardized names to
X3 and ISO.

>>
>>     There should be a defined value for the :CHARACTER-SET option that
>>     specifies all characters that the Lisp implementation can represent.

We are adding *all-repertoires* which is a list of all supported
repertoires.  At minimum it will contain :base and :standard.


>>         OPEN
>>     should signal an error if this :CHARACTER-SET option is used together with
>>     an :EXTERNAL-CODE-FORMAT option that cannot encode all the characters the
>>     Lisp implementation can represent.  Without this, there is no way to write
>>     a correct program that stores arbitrary strings in a file.
>>
>>     The default value of the :ELEMENT-TYPE argument should be an
>>     implementation-defined subtype of CHARACTER that can be a supertype of
>>     BASE-CHARACTER, rather than specified to be exactly BASE-CHARACTER.
>>
>>     It's hard to understand why both :CHARACTER-SET and :ELEMENT-TYPE exist,
>>     since they appear to control the same thing.  It would be best to remove
>>     :CHARACTER-SET and make sure that type-specifiers are expressive enough
>>     to allow :ELEMENT-TYPE to do everything that :CHARACTER-SET could do.
>>     The only justification for a separate :CHARACTER-SET option that can be
>>     inferred from the proposal is that :EXTERNAL-CODE-FORMAT :SHIFT-DELIMITED
>>     needs an -ordered- pair of character sets; this would be more appropriately
>>     specified as a list :EXTERNAL-CODE-FORMAT (:SHIFT-DELIMITED cs1 cs2).

We agree with the above three paragraphs and will incorporate these
comments into the revision.

>>
>>     The guarantee on page 11 that input operations will never return characters
>>     outside the character sets mentioned in the :CHARACTER-SET option should
>>     be removed.  It seems wrong to require more checking in input functions
>>     than in output functions.  The :EXTERNAL-CODE-FORMAT might be capable
>>     of representing more characters than the :CHARACTER-SET option specifies.

This dependency will be removed.

>>
>>     Are the external code format names listed on page 37 a proposal for
>>     standardized names, or merely illustrative examples?

There were illustrative.  Standardized names would come from ISO.

>>
>>     The motivations for the above comments are:
>>       - provide standard names for all portable concepts
>>       - allow, but not require, implementations to make it easy to write
>>         programs that work with multiple character sets without special effort
>>       - put the specification of the internal representation of characters
>>         in one and only one place in the options to OPEN
>>       - put the specification of the external representation of characters
>>         in one and only one place in the options to OPEN
>>
>>
>>    Page 16 (referring to paragraph 6) implies that Space is not a graphic
>>    character, but page 24 (referring to paragraph 6) implies that Space is
>>    a graphic character.  CLtL p.235 says Space is graphic, let's stick with
>>    that.

We'll reword this to make it clear that Space is graphic.

>>
>>    Pages 19 and 20 introduce a new type named simple-base-string, in addition
>>    to simple-string.  If you think about how simple-string would be used for
>>    compiler optimization, it makes sense for simple-string to be the name for
>>    the single simplest representation, rather than a name for a whole family
>>    of representations that would have to be discriminated at run time.  Thus
>>    what you call simple-base-string should be called simple-string, and what
>>    you call simple-string should just be called (simple-array character (*)).
>>    This would not be an incompatible change in the meaning of simple-string.
>>    Simple-string would be analogous to simple-vector.

Simple-string has the same problem as string, ie. it is ambiguous,
meaning a union of simple-string subtypes. We
are now in favor of making the same modifications for simple-string as
for string.  Analogous to string, we will have simple-base-string and
simple-general-string.  schar will be depreciated as a compability
function. Two new functions, sbchar and sgchar will be
introduced to operate on simple-base and simple-general
strings respectively.

>>
>>    Page 20 proposes to change (COERCE <integer> 'CHARACTER) incompatibly to be
>>    synonymous with CODE-CHAR instead of INT-CHAR.  This change seems
>>    unmotivated.  We would rather delete coercion from integers to characters
>>    entirely, for the same reason that coercion from characters to integers is
>>    not permitted.

We agree with your suggestion to eliminate coercion from integers to
characters.

>>
>>    Page 23 proposes an equivalence of CHAR-INT and CHAR-CODE, and of INT-CHAR
>>    and CODE-CHAR.  This is unnecessary and should be removed.

Right, we made explicit a restriction not in CLtL but meeting
(we thought) current practice. If it does actually meet current
practice we believe it is valid to include in the proposal.
Do you have some counter experience?

>>
>>    The last bullet on page 23 should be removed.  Part of the definition of
>>    each implementation-defined character attribute must be whether or not that
>>    attribute is removed from symbol names by READ.  Also the phrase "symbol
>>    construction" is ambiguous (does it mean READ or INTERN or MAKE-SYMBOL?)
>>    and should be avoided.

We'll make these changes.  (it is intended to mean READ).

>>
>>    Page 30 (referring to paragraph 24) and page 31 (referring to paragraph 2)
>>    amend MAKE-SEQUENCE and MAKE-STRING.  There are several problems:  It fails
>>    to make (MAKE-SEQUENCE 'STRING n) equivalent to (MAKE-STRING n), including
>>    handling of the presence or absence of the :INITIAL-ELEMENT option.  It
>>    fails to specify the default for the :ELEMENT-TYPE argument to MAKE-STRING.
>>    Earlier there was much controversy about whether by default strings should
>>    be base or extended, so it's really unfortunate that the proposal fails to
>>    take any stand on this issue.  We propose that (MAKE-STRING n) and
>>    (MAKE-SEQUENCE 'STRING n) return a base-string by default.  When the
>>    :INITIAL-ELEMENT option is specified, they return the most specialized
>>    type that can accomodate that character.

Right.  Make-sequence 'string will be defined as equivalent
to make-string but will be depreciated to the compability section,
the preferred usage is make-string with :element-type
specified.  Make-string with :element-type omitted will be
implementation defined (eg. base-string or general-string).


>>
>>
>>    EDITORIAL COMMENTS
>>
>>    Shouldn't there be a reference to relevant ISO document(s) in the
>>    bibliography?

These will be added.

>>
>>    The format of the later portion of the proposal, referring to locations
>>    in CLtL by numbering paragraphs, is hard to follow.  It would help to
>>    mention a page number and a function name.  In general, it is preferable
>>    to propose what the Common Lisp language should be rather than to propose
>>    how Guy Steele's book should be altered.

Well, we compromised.  For the prose discussion the first section
was hoped to be adequate.  Since the changes permeated CLtL we felt
a detailed change notation was necessary to direct the correct
editorial changes.
We'll add the page number and function name indication to the
change section as you suggest.

>>
>>    The page 14 description of the standard character subrepertoire needs an
>>    example.  There is an obvious candidate, namely $.  The ISO character #o044
>>    is a currency sign.  Many ASCII terminals overseas have a glyph other than
>>    dollar sign for this (e.g. Pound Sterling or Yen).

Thanks. We'll add this example to the proposal.

>>
>>    Page 15's table appears to contain some typographical errors (LV22, LX22,
>>    the glyph for capital J is K) so we don't trust the table at all.  Also,
>>    what are these IDs?  They don't appear anywhere else in the proposal.

Yep, you caught some typo errors which we will correct (in general
you can trust the table).  The IDs are for identification purposes
only and a footnote will be added to clearify this.  They were
obtained from one of the ISO standards (which we will reference).

∂16-Nov-88  1508	CL-Characters-mailer 	CS Proposal comments. DRAFT: Exten... 
Received: from STONY-BROOK.SCRC.Symbolics.COM (SCRC-STONY-BROOK.ARPA) by SAIL.Stanford.EDU with TCP; 16 Nov 88  15:08:09 PST
Received: from EUPHRATES.SCRC.Symbolics.COM by STONY-BROOK.SCRC.Symbolics.COM via CHAOS with CHAOS-MAIL id 493703; Wed 16-Nov-88 17:42:28 EST
Date: Wed, 16 Nov 88 17:42 EST
From: David A. Moon <Moon@STONY-BROOK.SCRC.Symbolics.COM>
Subject: CS Proposal comments. DRAFT: Exten...
To: Thom Linden <baggins@ibm.com>
cc: "X3J13: Character Subcommittee" <cl-characters@sail.stanford.edu>
In-Reply-To: <881114.133702.baggins@IBM.com>
Message-ID: <19881116224218.8.MOON@EUPHRATES.SCRC.Symbolics.COM>

I've excerpted your message to just the portions where I had
further remarks to make.  This time the remarks are just from
me, not from Symbolics as a whole.

    Date: Mon, 14 Nov 88 13:37:02 PST
    From: Thom Linden <baggins@ibm.com>

    >>    Page 10 refers to the representation of coded character sets as keyword
    >>    symbols.  Why not use CLOS objects?  There might be reasons, but you should
    >>    state them.

    We feel that if CLOS is to be ingrained in every CL application
    this might be appropriate.  But that is itself quite arguable.
    In any case, the CLOSification of CL is not a topic of this
    proposal and should not be done piecemeal.

I think we erred by mentioning CLOS in our comment.  The real thrust of the
comment was why use names rather than objects?  The objects should be defined
only by standard functions that construct and operate on them; whether their
implementation is in terms of CLOS, DEFSTRUCT, or something else need not be
specified.  I don't think we intended to propose that there be a standard
way for users to define subclasses of these object types.

    >>    Pages 19 and 20 introduce a new type named simple-base-string, in addition
    >>    to simple-string.  If you think about how simple-string would be used for
    >>    compiler optimization, it makes sense for simple-string to be the name for
    >>    the single simplest representation, rather than a name for a whole family
    >>    of representations that would have to be discriminated at run time.  Thus
    >>    what you call simple-base-string should be called simple-string, and what
    >>    you call simple-string should just be called (simple-array character (*)).
    >>    This would not be an incompatible change in the meaning of simple-string.
    >>    Simple-string would be analogous to simple-vector.

    Simple-string has the same problem as string, ie. it is ambiguous,
    meaning a union of simple-string subtypes. We
    are now in favor of making the same modifications for simple-string as
    for string.  Analogous to string, we will have simple-base-string and
    simple-general-string.  schar will be depreciated as a compability
    function. Two new functions, sbchar and sgchar will be
    introduced to operate on simple-base and simple-general
    strings respectively.

I still think the arguments that were marshalled in 1984 against having
simple-vector be a union of several subtypes apply as well to
simple-string, and that what you propose to do is too complex.  Having
three flavors of schar seems excessive, even if one flavor is deprecated.
I would think that the only kind of simple array of characters that needs a
name of its own is the one that is most commonly used, which I think is
(simple-array base-character 1).

    >>    Page 23 proposes an equivalence of CHAR-INT and CHAR-CODE, and of INT-CHAR
    >>    and CODE-CHAR.  This is unnecessary and should be removed.

    Right, we made explicit a restriction not in CLtL but meeting
    (we thought) current practice. If it does actually meet current
    practice we believe it is valid to include in the proposal.
    Do you have some counter experience?

I apologize, what you're proposing is already in CLtL (p.242, second
paragraph of description of CHAR-INT).  I think this was poor design on the
part of the authors of CLtL, and I can't think of any way a program could
depend on that property without grossly violating abstraction.  However,
given that it's already in CLtL, if you don't want to propose to get rid of
it, I won't push you.

    >>    Page 30 (referring to paragraph 24) and page 31 (referring to paragraph 2)
    >>    amend MAKE-SEQUENCE and MAKE-STRING.  There are several problems:  It fails
    >>    to make (MAKE-SEQUENCE 'STRING n) equivalent to (MAKE-STRING n), including
    >>    handling of the presence or absence of the :INITIAL-ELEMENT option.  It
    >>    fails to specify the default for the :ELEMENT-TYPE argument to MAKE-STRING.
    >>    Earlier there was much controversy about whether by default strings should
    >>    be base or extended, so it's really unfortunate that the proposal fails to
    >>    take any stand on this issue.  We propose that (MAKE-STRING n) and
    >>    (MAKE-SEQUENCE 'STRING n) return a base-string by default.  When the
    >>    :INITIAL-ELEMENT option is specified, they return the most specialized
    >>    type that can accomodate that character.

    Right.  Make-sequence 'string will be defined as equivalent
    to make-string but will be depreciated to the compability section,
    the preferred usage is make-string with :element-type
    specified.  Make-string with :element-type omitted will be
    implementation defined (eg. base-string or general-string).

Wait a minute, I don't think make-sequence should be deprecated.  There is
nothing wrong or old fashioned about it.

I wonder how well leaving the default :element-type to the implementation will
go over.  Perhaps that's necessary if concensus cannot be reached, but it seems
like a potential big source of accidental non-portability.  I confess that I
haven't thought about it very hard.  Let's see, it means that any portable 
program that uses non-base characters must always specify :element-type, and
any portable program that uses only base characters and wants to maximize
space efficiency must always specify :element-type.  That doesn't leave many
programs where it makes sense to omit :element-type.  Probably a lot of
programmers will omit :element-type and then be surprised to discover later
that their programs are non-portable when they seemed to work in a few
implementations.

∂17-Nov-88  1220	CL-Characters-mailer     
Received: from IBM.COM by SAIL.Stanford.EDU with TCP; 17 Nov 88  12:20:08 PST
Date: Thu, 17 Nov 88 11:15:53 PST
From: Thom Linden <baggins@ibm.com>
To: "David A. Moon" <moon@scrc-stony-brook.arpa>
cc: "X3J13: Character Subcommittee" <cl-characters@sail.stanford.edu>
Message-ID: <881117.111553.baggins@IBM.com>

David,
  Thanks for your note.

  >>  why use names rather than objects?

We did not introduce operations to create or manipulate character
sets or repertoires (which I'm learning to spell) primarily since
we wanted to minimize the language design in the proposal.
Also, it was not clear that these are needed by users (ie. no-one
was pounding on the door for functions of this type).

  >>  I would think that the only kind of simple array of characters
  >>  that needs a name of its own is the one that is most commonly
  >>  used,

That seems to be the crux of the problem.  The last document was in
line with your suggestions.  Consistent with current CLtL would
be to introduce only simple-most-general-string while many (eg. U.S.
only ) applications would want simple-base-string.


>> I don't think make-sequence should be deprecated.

Sorry to have given the the wrong impression.  Only make-sequence 'string
would be deprecated, not make-sequence.  Other make-sequences,
including making base-strings and most-general-strings are valid.


Regards,
  Thom

∂28-Nov-88  1543	CL-Characters-mailer 	extended character sets in symbols    
Received: from Xerox.COM by SAIL.Stanford.EDU with TCP; 28 Nov 88  15:41:18 PST
Received: from Semillon.ms by ArpaGateway.ms ; 28 NOV 88 15:35:01 PST
Date: 28 Nov 88 15:33 PST
From: masinter.pa@Xerox.COM
Subject: extended character sets in symbols
To: cl-characters@sail.stanford.edu
cc: Fischer.aisnorth@Xerox.COM
Message-ID: <881128-153501-3504@Xerox>

I had assumed that the proposal on character handling in strings would
retain the requirement that all strings were admissible arguments to
INTERN, and that 

(string-equal (symbol-name (intern string)) string)

for all valid strings.

This would mean that extended characters, if supported by an
implementation, would also be allowable in symbol names.

I've heard from some people who have read the "DRAFT: Extensions to Common
LISP to Support ...." who weren't sure that was an implication of the
proposal.

I'd like to make sure it is explicit. 

Did any of you intend otherwise?



∂28-Nov-88  1744	CL-Characters-mailer 	extended characters in symbols   
Received: from IBM.COM by SAIL.Stanford.EDU with TCP; 28 Nov 88  17:44:07 PST
Date: Mon, 28 Nov 88 16:26:44 PST
From: Thom Linden <baggins@ibm.com>
To: Larry Masinter <masinter.pa@xerox.com>
cc: cl-characters@sail.stanford.edu, Fischer.aisnorth@Xerox.COM
Message-ID: <881128.162644.baggins@IBM.com>
Subject: extended characters in symbols


>>    I had assumed that the proposal on character handling in strings would
>>    retain the requirement that all strings were admissible arguments to
>>    INTERN, and that
>>
>>    (string-equal (symbol-name (intern string)) string)
>>
>>    for all valid strings.
>>
>>    This would mean that extended characters, if supported by an
>>    implementation, would also be allowable in symbol names.
>>
>>    I've heard from some people who have read the "DRAFT: Extensions to Common
>>    LISP to Support ...." who weren't sure that was an implication of the
>>    proposal.
>>
>>    I'd like to make sure it is explicit.
>>
>>    Did any of you intend otherwise?
>>

That is the intention.  We'll add your example since someone felt
this was in question.

Regards,
  Thom

∂06-Dec-88  0207	CL-Characters-mailer 	cs proposal  
Received: from IBM.COM by SAIL.Stanford.EDU with TCP; 6 Dec 88  02:07:21 PST
Date: Mon, 05 Dec 88 16:22:15 PST
From: Thom Linden <baggins@ibm.com>
To: Larry Masinter <masinter.pa@xerox.com>
cc: "X3J13: Character Subcommittee" <cl-characters@sail.stanford.edu>
Message-ID: <881205.162215.baggins@IBM.com>
Subject: cs proposal

Larry,
    Your statement that nothing in CL *requires* change is probably
  correct but is not really relevant.  The fact of the matter is
  that existing and planned implementations of CL are providing
  support with idiosyncratic syntax as well as semantics.  Why?
  Amoung the reasons are: CLtL doesn't say anything about it,
  developers like to invent, differences in performance criteria,
  differences in user requirements, etc.

    In defining a common syntax and semantics, we also acknowledge that
  flexibility is required in the language definition.  Fixnum
  is a good example.  Everyone complains of the Fixnum problem in C but
  performance criteria have outweighed a rigid solution.  Similarly,
  your suggestion for using forwarding pointers for 'fattening' strings
  has adverse performance behavior on conventional hardware.

    In one of your examples, you show that base-character is not,
  in general, portable.  This is correct.  Base-character is
  a pragmatic acceptance of the need for a level of portability
  more general than standard-char, but less general than
  most-general-character.  There are others but these were
  not given standardized names (eg. region-specialized).  To the
  extent that applications are ported across machines with
  identical base repertoires, the base-character type will be
  of use.


Regards,
  Thom

∂06-Dec-88  0207	CL-Characters-mailer 	cs proposal  
Received: from IBM.COM by SAIL.Stanford.EDU with TCP; 6 Dec 88  02:07:47 PST
Date: Mon, 05 Dec 88 16:30:11 PST
From: Thom Linden <baggins@ibm.com>
To: "X3J13: Character Subcommittee" <cl-characters@sail.stanford.edu>
Message-ID: <881205.163011.baggins@IBM.com>
Subject: cs proposal

Dave Unietis and I are meeting tomorrow to revise the proposal.  After
this is completed, I will forward the document to you for final
review.  Please forward any specific changes/corrections asap
to me for inclusion.

Bob,
  I recall a response mentioning some comments from ILA but I haven't
seen them.  Have you heard anything that should be taken into account?

Regards,
  Thom

∂06-Dec-88  1057	CL-Characters-mailer 	Re: cs proposal   
Received: from Xerox.COM by SAIL.Stanford.EDU with TCP; 6 Dec 88  10:56:54 PST
Received: from Semillon.ms by ArpaGateway.ms ; 06 DEC 88 10:54:10 PST
Date: 6 Dec 88 10:53 PST
From: masinter.pa@Xerox.COM
Subject: Re: cs proposal
In-reply-to: Thom Linden <baggins@ibm.com>'s message of Mon, 05 Dec 88
 16:22:15 PST
To: Thom Linden <baggins@ibm.com>
cc: Larry Masinter <masinter.pa@Xerox.COM>, "X3J13: Character Subcommittee"
 <cl-characters@sail.stanford.edu>
Message-ID: <881206-105410-6172@Xerox>

Thanks for your response, Thom.

I believe that:
a) we should make changes to Common Lisp (as per CLtL) only in response to
actual problems,
b) the changes we propose should actually solve the problems we identify
c) we should justify those changes by showing how they fix the problems.

You've identified some possible "problems"
* "CLtL doesn't say anything about it"
 -- I reject this, as I think the CHAR-CODE-LIMIT constant
   was introduced exactly to allow multiple character sets

* "developers like to invent"
  -- I reject this too, as it doesn't seem like a good reason for
   standards bodies to invent

* "differences in performance critera"
  -- I accept this as a valid reason to change Common Lisp;
    following CLtL could result in poorer performance
   for some implementations

* "differences in user requirements"
  -- I'd accept this if I understood more clearly what those
   requirements were. I've not seen them spelled out
   in any document, although I've heard some of them
  alluded to in some conversations. One of the risks we
  should try to avoid is satisfying percieved user requirements
  when those user requirements are not really there.
  I think this happened with char-font in CLtL, for example,
  where some extra nonsense got added to CLtL only
  to discover that the requirement either was absent or
  the facility proposed didn't satisfy the requirement.

I agree that it is likely that using a forwarding pointer for 'fattening'
strings might have adverse performance behavior on conventional hardware,
but I think it deserves more than a hand-wave consideration. My
back-of-the-envelope calculation of the extra overhead for the level of
indirection is a maximum of 5% on the most string intensive benchmark I can
construct. 
As long as strings can have variable "width", you cannot avoid a width
fetch & test. To fetch the extra pointer to the string base as well as the
width would slow down a SCHAR by 30%, and it is hard to construct any
realistic program that has more than 1/6 of the "gross" operations
consisting of SCHARs. Perhaps a counterexample can be constructed, but I
think, before changing Common Lisp in fairly radical ways--making STRING a
"nest" of types rather than (VECTOR CHARAcTER) is radical--I think it
deserves that kind of analysis.

You say "Base-character is a pragmatic acceptance of the need for a level
of portability more general than standard-char, but less general than
most-general-character."

I don't understand how a level of portability can be more or less general. 

"To the extent that applications are ported across machines with identical
base repertoires, the base-character type will be of use."

I think this strengthens my point rather than weakens it: I claim that
implementations should name the repertoires of their base character sets in
the registry, and that applications should name the repetoire (e.g.,
(CHARACTER :ASCII)). When applications are ported across machines with
identical base repertoires, the declarations will have the same effect; if
the application is ported to a machine with a different base repertoire, a
declaration will have the same *meaning*, and will be interpreted
appropriately (or else ignored) by the target implementation. 


∂06-Dec-88  1509	CL-Characters-mailer 	problems
Received: from IBM.COM by SAIL.Stanford.EDU with TCP; 6 Dec 88  15:09:38 PST
Date: Tue, 06 Dec 88 11:26:51 PST
From: Thom Linden <baggins@ibm.com>
To: Larry Masinter <masinter.pa@xerox.com>
cc: "X3J13: Character Subcommittee" <cl-characters@sail.stanford.edu>
Message-ID: <881206.112651.baggins@IBM.com>
Subject: problems

Larry,
  The reasons I mentioned were not meant as a list of rational for
a standard rather as a partial list of why developers implement
different syntax and semantics for ichs support.


∂03-Jan-89  0339	CL-Characters-mailer 	cs proposal  
Received: from Riverside.SCRC.Symbolics.COM (SCRC-RIVERSIDE.ARPA) by SAIL.Stanford.EDU with TCP; 3 Jan 89  03:39:34 PST
Received: from F.ILA.Dialnet.Symbolics.COM (FUJI.ILA.Dialnet.Symbolics.COM) by Riverside.SCRC.Symbolics.COM via DIAL with SMTP id 305295; 3 Jan 89 06:37:18 EST
Date: Tue, 3 Jan 89 05:35 EST
From: Robert W. Kerns <RWK@F.ILA.Dialnet.Symbolics.COM>
Subject: cs proposal
To: baggins%ibm.com@RIVERSIDE.SCRC.SYMBOLICS.COM, cl-characters%sail.stanford.edu@RIVERSIDE.SCRC.SYMBOLICS.COM
In-Reply-To: <881205.163011.baggins@IBM.com>
Supersedes: <19890103091441.7.RWK@F.ILA.Dialnet.Symbolics.COM>
Comments: Retransmission of failed mail.
Message-ID: <19890103103543.3.RWK@F.ILA.Dialnet.Symbolics.COM>

    Date: Mon, 05 Dec 88 16:30:11 PST
    From: Thom Linden <baggins@ibm.com>

    Dave Unietis and I are meeting tomorrow to revise the proposal.  After
    this is completed, I will forward the document to you for final
    review.  Please forward any specific changes/corrections asap
    to me for inclusion.

    Bob,
      I recall a response mentioning some comments from ILA but I haven't
    seen them.  Have you heard anything that should be taken into account?

Sorry, I just got back from Japan a few days ago.  I ran out of time
before I left and didn't get to take care of all my X3J13 business.
I'll try to get to it in the next couple of days (sigh).

∂10-Jan-89  0756	CL-Characters-mailer 	proposal
Received: from IBM.COM by SAIL.Stanford.EDU with TCP; 10 Jan 89  07:56:21 PST
Date: Mon, 09 Jan 89 19:43:15 PST
From: Thom Linden <baggins@ibm.com>
To: "X3J13: Character Subcommittee" <cl-characters@sail.stanford.edu>
Message-ID: <890109.194315.baggins@IBM.com>
Subject: proposal

After major delays, the revision is almost out.  Due to this I
won't ask for a vote unless 1) you all agree and 2) J13 seems willing.

Regards,
  Thom

∂10-Jan-89  0756	CL-Characters-mailer 	Hawaii  
Received: from IBM.COM by SAIL.Stanford.EDU with TCP; 10 Jan 89  07:56:50 PST
Date: Mon, 09 Jan 89 19:57:49 PST
From: Thom Linden <baggins@ibm.com>
To: "X3J13: Character Subcommittee" <cl-characters@sail.stanford.edu>
Message-ID: <890109.195749.baggins@IBM.com>
Subject: Hawaii

I would like our group to get together.  My preference is
Sunday or Monday for about 1 hour.

Regards,
  Thom

∂12-Jan-89  2000	CL-Characters-mailer 	cs proposal  
Received: from IBM.COM by SAIL.Stanford.EDU with TCP; 12 Jan 89  20:00:17 PST
Date: Thu, 12 Jan 89 17:18:53 PST
From: Thom Linden <baggins@IBM.com>
To: "X3J13: Character Subcommittee" <cl-characters@sail.stanford.edu>
Message-ID: <890112.171853.baggins@almvma>
Subject: cs proposal

Well, it's out.  I looking forward to hearing your reaction to the
modifications.  Some resulted from the meeting Larry, Dave and I
had a few weeks back.  Others I slipped in myself.  We have been
put on Wednesdays agenda so let's plan on getting together
Monday when the afternoon session ends.  I'm asking Jan for a
room (we'll make do somewhere on the veranda if one is not available).

Aloha,
  Thom

∂12-Jan-89  2000	CL-Characters-mailer 	travel  
Received: from IBM.COM by SAIL.Stanford.EDU with TCP; 12 Jan 89  20:00:31 PST
Date: Thu, 12 Jan 89 17:29:46 PST
From: Thom Linden <baggins@IBM.com>
To: "X3J13: Character Subcommittee" <cl-characters@sail.stanford.edu>
Message-ID: <890112.172946.baggins@almvma>
Subject: travel

I'm on a flight Friday at noon.  I probably won't read any mail after
tonight.

Thom

∂12-Jan-89  2339	CL-Characters-mailer 	sub meeting  
Received: from IBM.COM by SAIL.Stanford.EDU with TCP; 12 Jan 89  23:39:49 PST
Date: Thu, 12 Jan 89 22:09:19 PST
From: Thom Linden <baggins@IBM.com>
To: "X3J13: Character Subcommittee" <cl-characters@sail.stanford.edu>
Message-ID: <890112.220919.baggins@almvma>
Subject: sub meeting

The monday afternoon meeting at 3 replaces the Sunday evening one
I suggested.  I'll be in Friday evening.  If anyone wants to
get together and meet as well sometime Sat or Sunday, contact me
at the Sheraton.

Aloha,
  Thom

∂23-Jan-89  2040	CL-Characters-mailer 	character committee issues  
Received: from Xerox.COM by SAIL.Stanford.EDU with TCP; 23 Jan 89  20:40:49 PST
Received: from Semillon.ms by ArpaGateway.ms ; 23 JAN 89 20:37:35 PST
Date: 23 Jan 89 20:36 PST
From: masinter.pa@Xerox.COM
Subject: character committee issues
To: cl-characters@sail.stanford.edu
cc: masinter.pa@Xerox.COM
Message-ID: <890123-203735-2833@Xerox>

What I believe I recommended for action in the character committee is to
prepare a ballot on the separable issues in the character proposal,
retaining the document in toto, and then modifying the document, if
necessary, to reflect the results of the ballot.

The  areas I think are separable are as follows. They are independent of
each other (that is, I think it is possible to have any subset of these
pass) except where noted. I think these issues 'cover' the current
character proposal.

Issue: CHAR-FONT-UNUSED
* elimination of CHAR-FONT and CHAR-BITS, related parameters, and the
STRING-CHAR type. (I.e., identification of STRING-CHAR with CHARACTER). 

If this fails, or if people only wanted to eliminate CHAR-FONT and not
CHAR-BITS, most of the rest of the proposal gets more complicated & will
have to be rewritten.

Issue: STRING-TYPE-RESTRICTIVE
* change to the STRING type to be "all vectors with element type a subtype
of CHARACTER" rather than (VECTOR CHARACTER).  Specify the (modified)
behavior of various functions that take a type specifier when given STRING.

This is the central part of the proposal.

Issue: STRING-TYPE-ABBREVIATIONS
* add convenient abbreviations BASE-CHARACTER, BASE-STRING, GENERAL-STRING,
MOST-GENERAL-STRING, etc.

This requires STRING-TYPE-RESTRICTIVE.

Issue: FILE-EXTERNAL-REPRESENTATION
* add standard :external-code-format keyword to open, with unspecified
range.

Issue: CHARACTER-IDENTIFICATION-NONPORTABLE
* introduce the notion of Registries, require a fixed set of registries,
standardize on #\registry:id, add all-implemented-registries and
find-character.

This part deals with the mechanism by which characters can be identified
portably between implementations that do not share the same coded character
set.

Issue: CHARACTER-FUNCTIONS-UNDERSPECIFIED
* (do not) specify the 'intent' of the behavior of ALPHA-CHAR-P,
LOWER-CASE-P etc for alphabetic and non-alphabetic scripts (e.g., works for
Greek, no-op for Hangul, etc.)


 

∂24-Jan-89  1231	CL-Characters-mailer 	Comments on the Character proposal dated January 1, 1989  
Received: from ALDERAAN.SCRC.Symbolics.COM ([128.81.41.109]) by SAIL.Stanford.EDU with TCP; 24 Jan 89  12:31:27 PST
Received: from EUPHRATES.SCRC.Symbolics.COM by ALDERAAN.SCRC.Symbolics.COM via CHAOS with CHAOS-MAIL id 263137; Tue 24-Jan-89 14:46:01 EST
Date: Tue, 24 Jan 89 14:46 EST
From: David A. Moon <Moon@STONY-BROOK.SCRC.Symbolics.COM>
Subject: Comments on the Character proposal dated January 1, 1989
To: Thom Linden <Baggins@IBM.COM>
cc: CL-Characters@SAIL.STANFORD.EDU, X3J13@SAIL.STANFORD.EDU, Common-Lisp-Implementors@STONY-BROOK.SCRC.Symbolics.COM,
    KMP@STONY-BROOK.SCRC.Symbolics.COM, Palter@STONY-BROOK.SCRC.Symbolics.COM
Message-ID: <19890124194625.1.MOON@EUPHRATES.SCRC.Symbolics.COM>

Please acknowledge receipt of this mail so I can be sure it was
not lost in the network.  The reply needn't be CC'ed to any of
the other recipients.

Page 6 -- *all-registry-names* should be renamed to
*all-character-registry-names*; the word "registry" by itself
is too general.

Page 9 -- the fourth bullet requires a defined total ordering of all
characters.  This seems unnecessary, and is impossible to implement in any
system (such as Symbolics Genera) that allows dynamic addition of character
registries by third-party software vendors and by users; in such a system
character codes have to be allocated dynamically and therefore their order
cannot be fixed ahead of time.

Page 9 -- This says an implementation must define the result of
standard-char-p on the characters it supports.  I think that is incorrect.
Common Lisp fully defines the result of standard-char-p, which is NIL
for all characters added by an implementation.

Page 14 -- This EXTERNAL-WIDTH function probably should be part of a
database facility or a terminal screen template facility; I'm not sure it
is useful by itself.  Also note that its result is only meaningful with
respect to a specific state of the stream.  To give two examples, with the
SO/SI encoding the answer can vary by 1 depending on whether the stream is
already shifted into the correct state for the first character; with the
universal encoding Symbolics uses, the answer can vary by a lot depending on
whether the character repertoires appearing in the string have been used
earlier on the same stream (and hence have been assigned encoding numbers).
Because of this dependence on the state of the stream, I cannot think of
any correct use of EXTERNAL-WIDTH that does not involve immediately
outputting the string to the stream.  Therefore I believe the same effect
can be achieved without adding any new functions, by calling FILE-POSITION,
outputting to the stream, calling FILE-POSITION again, and subtracting.  If
you still want to propose this feature, you should change the name: use
"length" instead of "width", since that's the word Common Lisp always uses,
and use a name that relates to the :EXTERNAL-CODE-FORMAT option to OPEN;
for example, STRING-LENGTH-IN-EXTERNAL-CODE-FORMAT or
EXTERNAL-CODED-STRING-LENGTH.

Page 24 -- I can't figure out what you intend the meaning of SIMPLE-STRING
to be.  Your report mostly does not mention it, but it doesn't say to
remove it either.  If I have correctly correlated page 24 back to CLtL, you
are defining SIMPLE-STRING to be synonymous with SIMPLE-GENERAL-STRING.
Maybe what you really meant, though, was what you said in November you
would do, which was to make SIMPLE-STRING mean (AND STRING SIMPLE-ARRAY),
in other words a union of several subtypes.  This is particular confusing
because Common Lisp uses the name SIMPLE-VECTOR to mean what you might call
a simple general vector, that is, (SIMPLE-ARRAY T 1) rather than
(SIMPLE-ARRAY * 1).  Here are my suggestions for what to do with the
various names for string subtypes:

  STRING                  As a union of all strings, this is fine.
  GENERAL-STRING          I think (VECTOR CHARACTER) is just as good.
  BASE-STRING             I think (VECTOR BASE-CHARACTER) is just as good.
  SIMPLE-STRING           Should mean (SIMPLE-ARRAY CHARACTER 1).
  SIMPLE-BASE-STRING      This is fine.
  SIMPLE-GENERAL-STRING   This name is horrible, use SIMPLE-STRING.

My rationale for these suggestions largely comes from thinking about
which of these names would ever be used in type declarations and about
how these names relate to the other names already in Common Lisp.  To
repeat older comments:

  Pages 19 and 20 introduce a new type named simple-base-string, in addition
  to simple-string.  If you think about how simple-string would be used for
  compiler optimization, it makes sense for simple-string to be the name for
  the single simplest representation, rather than a name for a whole family
  of representations that would have to be discriminated at run time.  Thus
  what you call simple-base-string should be called simple-string, and what
  you call simple-string should just be called (simple-array character (*)).
  This would not be an incompatible change in the meaning of simple-string.
  Simple-string would be analogous to simple-vector.
          
I changed my mind slightly on that and now claim that while SIMPLE-STRING
should still be a single representation, not a union, it should be the
representation that can hold all characters.  This is both because of the
principle that correct programs should be easier to write than
extra-efficient programs, and because of the powerful analogy with the name
SIMPLE-VECTOR.  Then the name SIMPLE-BASE-STRING is also needed for
convenient type declarations of the more efficient but less functional
string representation.  That name is good, by analogy to BASE-CHARACTER.

Adopting the above suggestions helps you decide what to do about the
SCHAR, SBCHAR, and SGCHAR mess.  First of all, you only need two functions,
not three, because there are only two specified specialized representations.
SCHAR should be for what I've called SIMPLE-STRING, SBCHAR should be
for SIMPLE-BASE-STRING, and SGCHAR is not needed.  (In fact I would prefer
to remove all of the specialized versions of AREF from the language, in
favor of THE or type declarations, but I know that would only pass over
some peoples' dead bodies so I won't push it.)

In case you are wondering, I have no quarrel with the name BASE-CHARACTER
and would not want to see it removed.  I guess I differ from Larry here,
unless I erred when I wrote down his comments during the meeting.

Page 25 -- The discussion of STRING and SIMPLE-STRING thinks that there
is a distinction between declaration and discrimination, but Common Lisp
no longer has such a distinction.  Even when Common Lisp did have such
a distinction, the meanings for declaration stated here were incorrect.

Page 29 -- *all-character-registry-names* has to be a variable, not a
constant, to accomodate systems (such as Symbolics Genera) that allows
dynamic addition of character registries by third-party software vendors
and by users.

Page 35 -- CHAR-REGISTRY should be renamed to CHAR-REGISTRY-NAME, so that
if at some later time character registry objects are added, there is no
possibility of confusion about whether this function returns a name or
an object.

Page 40 -- the default :ELEMENT-TYPE for OPEN cannot be BASE-CHARACTER.  I
think this was discussed at the X3J13 meeting.  The report suffers from a
confusion between two meanings of BASE-CHARACTER: the character type
implemented most efficiently by the Lisp, and the character type most
natural to the file system.  These are not always the same.  Furthermore,
in a network-based system that supports multiple file systems equally
(Symbolics Genera is an example), each file system might have a different
natural character type.  BASE-CHARACTER should just mean the character type
implemented most efficiently by the Lisp.  The default for :ELEMENT-TYPE
has two viable choices that I can see, and maybe you should just propose
both and let people vote:

  (1) CHARACTER.  This matches the behavior of MAKE-STRING and friends,
  adheres to the principle that writing correct programs should be easier
  than writing extra-efficient programs (since making a program correct
  requires making every part of it correct, while making a program
  efficient only requires improving the bottlenecks), and doesn't cost
  anything in implementations that don't have extended characters.

  (2) The most natural type for the particular pathname being opened.
  In some systems this would be a constant, and in a subset of those
  systems this would be BASE-CHARACTER, however in general this might
  depend on the host, device, or even type fields of the pathname,
  and might also depend on information stored in the file system.
  In general this would always be an (improper) supertype of
  BASE-CHARACTER, but it's probably a bad idea to make that a requirement,
  as some file systems might not be able to implement it conveniently.
  Again this doesn't cost anything in implementations that don't have
  extended characters.

The relationship of option 2 to :ELEMENT-TYPE :DEFAULT (a feature that
already exists in Common Lisp) needs to be clarified.  Perhaps they
are the same.

Also the following promise from 14 November did not show up in the report:

  >>     There should be a name for the "natural" encoding and there should be a
  >>     specification of the properties of the natural encoding that a programmer
  >>     can rely on.  Suggestions for the name include :BASE, :NATURAL, and
  >>     :INTERCHANGE.  The definition probably involves the concept of data
  >>     interchange with non-Lisp programs on the same system.
  
  This will be added to the revision.

Appendix B -- I disagree with the way you've used deprecation.  I'll 
comment on each individual point:
 - I see no justification for deprecating STANDARD-CHAR.
 - I agree that STRING-CHAR should be deprecated, not deleted nor kept.
 - I think fonts and bits should be removed outright, not deprecated,
   because no portable program could possibly be using them.
 - I think the CHAR-INT function needs to be kept, although the INT-CHAR
   function should go away.  This is for hashing.  See comments below
   on character attributes.

No particular page -- the use of strings for naming registries, labelling
characters, and naming external code formats is objectionable.  Nothing
else in Common Lisp is named by strings.  Use of strings might lead to
efficiency problems.  We feel that keyword symbols are the appropriate
objects to use for these three kinds of names.

No particular page -- We agree with the deprecation or deletion of the two
particular character attributes defined by CLtL, but not with the
deprecation of the whole concept of character attributes.  In fact on page
20 you say "characters are uniquely distinguished by their codes," which
makes it impossible to have character attributes at all.  The language must
define how conforming programs should be written so that they will work
both in implementations with character attributes and in implementations
without them.  For example, the value of (eql x (code-char (char-code x)))
is unspecified.  Another thing that needs to be said is that the exact
character operations (char=, string=, etc.) respect all character
attributes, while the inexact character operations (char-equal,
string-equal, etc.) respect or ignore each character attribute in an
implementation-defined but consistent fashion.  Some of what you say on
page 44 about attributes in general needs to be part of the spec, not
deprecated.  I would retain everything on that page except for INT-CHAR and
the last bullet (referring to bits and fonts), and I would add a remark
that FIND-SYMBOL and INTERN respect character attributes.  If you want,
perhaps I or someone else at Symbolics can provide exact text for what
to say about character attributes that you could insert into your report.

No particular page -- On the subject of defining character registries in a
separate document, and relating them to ISO standards for character
encoding: I think that's fine.  I don't see anything wrong with introducing
the concept of character registry and the requirement that each character
object relates to exactly one registry.  However, I think the somewhat
random list of character registries on pages 7-8 and again on page 21 does
not belong in the language specification.  Even the names of the
standardized character registries belong in the character registry
standard, not in the Common Lisp language standard.  I'm confused about the
meaning of BASE, STANDARD, and CONTROL as character registry names; these
are mentioned in your report but not explained very well.  If these are
character registries that are required to exist in all Common Lisp
implementations, then unlike the others they do belong in the Common Lisp
language standard, not in the character registry standard.

At the meeting there was some discussion about the issue of enumerating all
characters in a character registry.  People claimed incorrectly that it was
impossible.  In fact it's possible to do this, with questionable
efficiency, by the following program:

  (dotimes (code char-code-limit)
    (let ((char (code-char code)))
      (when char
        (when (eq (char-registry-name char) desired-registry-name)
          ... process this char ...))))

Of course you have to change the EQ to EQUALP if you continue to use
strings to name character registries.  For more efficiency, you could add
a way to iterate over all the codes in one character registry, but I think
that is unnecessary.


TYPOS:

25 -- base-string is missing from the Table 4-1 amendment.

26 -- general-string is not an array of BASE characters, also the first
two paragraphs under A.4.8 are garbled (the two separate sentences for
strings for symbols got smushed together).

37 -- This says the default for the :ELEMENT-TYPE option to MAKE-STRING
is SIMPLE-STRING.  Actually it's CHARACTER.


VOTING:

You asked for suggestions on how to modularize the voting.  Here is a
possible breakdown into separate issues, admittedly not very well thought
out.  In general, I feel that breaking down the voting into separate issues
is a good idea even if some of the issues are interdependent.  If people
don't understand the interdependencies well enough to vote properly, then I
would claim that the subcommittee report hasn't explained things well
enough.

Concept of character registries and character labels
Functions and variables for character registries and character labels
Page 9 rules for implementation-defined character registries
New syntax for #\
New syntax for the CHARACTER type specifier, new argument to CHARACTERP
New rules for names of symbols
Deprecation of STRING-CHAR
Deprecation of STANDARD-CHAR
BASE-CHARACTER type
New meaning of STRING type, and its subtypes
SCHAR
SBCHAR
SGCHAR
Type returned by (CONCATENATE 'STRING ...) and similar forms
Extensions to COERCE
EXTERNAL-WIDTH function
:ELEMENT-TYPE option to OPEN
:EXTERNAL-CODE-FORMAT option to OPEN
CHAR-CODED-CHARACTER-SET-VALUE function
Rules for implementation-defined character attributes
Removal or deprecation of bits and fonts, and implied arglist changes
Removal or deprecation of the semi-standard format effector characters
Support of extended characters in the readtable
Removal or deprecation of CHAR-INT
Removal or deprecation of INT-CHAR
Miscellaneous other and editorial changes

Wow, 26 ballot items!  Democracy on the march!  I think the complexity of
the issues amply justifies having about that many separate issues, though.

∂24-Jan-89  1312	CL-Characters-mailer 	Comments on the Character proposal dated January 1, 1989  
Received: from Think.COM by SAIL.Stanford.EDU with TCP; 24 Jan 89  13:12:07 PST
Return-Path: <barmar@Think.COM>
Received: from sauron.think.com by Think.COM; Tue, 24 Jan 89 15:47:56 EST
Received: from OCCAM.THINK.COM by sauron.think.com; Tue, 24 Jan 89 16:08:30 EST
Date: Tue, 24 Jan 89 16:09 EST
From: Barry Margolin <barmar@Think.COM>
Subject: Comments on the Character proposal dated January 1, 1989
To: David A. Moon <Moon@stony-brook.scrc.symbolics.com>
Cc: Thom Linden <Baggins@ibm.com>, CL-Characters@sail.stanford.edu,
        X3J13@sail.stanford.edu,
        Common-Lisp-Implementors@stony-brook.scrc.symbolics.com,
        KMP@stony-brook.scrc.symbolics.com,
        Palter@stony-brook.scrc.symbolics.com
In-Reply-To: <19890124194625.1.MOON@EUPHRATES.SCRC.Symbolics.COM>
Message-Id: <19890124210921.6.BARMAR@OCCAM.THINK.COM>

    Date: Tue, 24 Jan 89 14:46 EST
    From: David A. Moon <Moon@stony-brook.scrc.symbolics.com>

    No particular page -- the use of strings for naming registries, labelling
    characters, and naming external code formats is objectionable.  Nothing
    else in Common Lisp is named by strings.  Use of strings might lead to
    efficiency problems.  We feel that keyword symbols are the appropriate
    objects to use for these three kinds of names.

I agree with your suggestion.  However, there is at least one global
database in CL that is named by strings: the package name database.

                                                barmar

∂24-Jan-89  1525	CL-Characters-mailer 	comments on character proposal   
Received: from cs.utah.edu by SAIL.Stanford.EDU with TCP; 24 Jan 89  15:25:15 PST
Received: from defun.utah.edu by cs.utah.edu (5.59/utah-2.1-cs)
	id AA17750; Tue, 24 Jan 89 16:23:51 MST
Received: by defun.utah.edu (5.59/utah-2.0-leaf)
	id AA19939; Tue, 24 Jan 89 16:23:43 MST
From: sandra%defun@cs.utah.edu (Sandra J Loosemore)
Message-Id: <8901242323.AA19939@defun.utah.edu>
Date: Tue, 24 Jan 89 16:23:41 MST
Subject: comments on character proposal
To: cl-characters@sail.stanford.edu

I have some specific comments about the latest proposal, and a couple
of general complaints. 

First for the specific comments.

Getting rid of bits and fonts (section 2.1) seems like a very good
idea to me.  I would argue for deleting these "features" completely
instead of merely deprecating them, because there now seems to be
general agreement that the whole idea was brain-damaged in the first
place, plus it's just about impossible to use them portably anyway
(since implementations are free not to support them).  Deprecating the
features would simply perpetuate the current sad state of affairs in
to the ANSI standard. 

I am not at all sure why we need to standardize the idea of character
registries at all, much less state that a character can only belong to
one registry, or define a standard set of registries.  What does having
registries buy the user, other than perhaps a way to test whether a
character belongs to one or not?  Why isn't it sufficient just to say
that implementations can support extended characters, and leave it at
that? 

I'm confused about how you propose to handle characters that appear in
more than one character repetoire, and whether characters with accent
marks are considered distinct from characters without accents.  For
example, is the French "C" with a cedilla distinct from a normal
French "C", and is that distinct from the standard-char "C"?

The way the document describes things now, it seems like the Common
Lisp standard would have to include a statement of exactly what
characters belong in each of the standard registries listed in section
2.2.  Otherwise, implementors might go off and define their own
character registries that happen to include some characters that ought
to belong in one of these standard registries.  For instance, the machine
I happen to be sitting in front of right now supports an 8-bit native
character set, and it seems perfectly reasonable for a Lisp runnning on
this machine to include all 256 characters in its base character set,
but some of those might actually be supposed to live off in some other
registry.

Also in section 2.2, why is it necessary for there to be a total
ordering, or even a partial ordering, of all characters?  It seems
like CHAR< and friends are not very useful except when comparing base
characters anyway.  It seems like it would difficult to get things
like the Spanish N-with-twiddle character to collate correctly anyway,
given the constraints you have put on how character codes are derived
and the requirement that CHAR< be just like < on the char-codes. 

It doesn't seem like STANDARD-CHAR-P belongs in the list of character
predicates on p. 9, since no extended characters can possibly be
STANDARD-CHAR-P anyway. 

The stuff in section 2.3 seems mostly reasonable to me.  It's not really
clear why you need GENERAL-STRING (as distinct from STRING) and
SIMPLE-GENERAL-STRING (as distinct from SIMPLE-STRING).  Again, some
rationale would be helpful.

In section 2.4, the general idea of specifying an external character
encoding to OPEN seems reasonable.  However, I'm confused by the
business about having more than one coded character set mixed
together.  If a character appears in more than one coded character
set, which encoding takes precedence?  It seems like this has not been
well thought-out.  Also, seeing as though we have just voted down a
proposal to add an EXTERNAL-WIDTH function, it seems like a very bad
idea to lump it in here. 

Now for the general comments.

One thing that is not clear to me from reading this document is how
much of it has already been standardized by ISO.  I share Larry's
concern that we might standardize one thing, and then have ISO go off
and standardize something completely different.  I think it's a
mistake to try to second-guess what ISO might do. 

I am also concerned about trying standardize things that have not yet
been implemented.  I think it's a mistake to try to do language design
in a standards committee.

Finally, I have some problems with the presentation of your proposal.
One problem, as I mentioned at the meeting, is that you've made it an
all-or-nothing package, and I can't vote for the whole thing because
there are some parts of it that do not seem appropriate, even though I
would support some of the other changes individually.  The other
problem is that Appendix A is virtually unreadable.  Some of the
conceptual changes involve wording changes to several passages, and I
know that there are some other changes in the appendix that are not
mentioned in the introductory blurb at all.  Is it totally impossible
to recast the changes in standard cleanup format proposals?  The
advantage of that format is that it presents more context, including a
clear statement of why the existing CLtL behavior is "broken" and a
rationale for the proposed change. 

I know that we adopted things like the CLOS document that were
presented as single mega-proposals, but those were primarily additions
to the language and what you are proposing is essentially a large
number of incompatible changes.  I'm having a hard time identifying
what all of those changes are.

-Sandra
-------

∂27-Jan-89  1916	CL-Characters-mailer 	[Moon@STONY-BROOK.SCRC.Symbolics.COM: Comments on the Character proposal dated January 1, 1989]   
Received: from STONY-BROOK.SCRC.Symbolics.COM (SCRC-STONY-BROOK.ARPA) by SAIL.Stanford.EDU with TCP; 27 Jan 89  19:16:20 PST
Received: from EUPHRATES.SCRC.Symbolics.COM by STONY-BROOK.SCRC.Symbolics.COM via CHAOS with CHAOS-MAIL id 528661; Fri 27-Jan-89 22:13:03 EST
Date: Fri, 27 Jan 89 22:13 EST
From: David A. Moon <Moon@STONY-BROOK.SCRC.Symbolics.COM>
Subject: [Moon@STONY-BROOK.SCRC.Symbolics.COM: Comments on the Character proposal dated January 1, 1989]
To: Thom Linden <Baggins@IBM.COM>
cc: CL-Characters@SAIL.STANFORD.EDU, KMP@STONY-BROOK.SCRC.Symbolics.COM
Included-msgs: <19890124194625.1.MOON@EUPHRATES.SCRC.Symbolics.COM>,
               The message of 24 Jan 89 14:46 EST from Moon@STONY-BROOK.SCRC.Symbolics.COM,
               The message of 24 Jan 89 14:46 EST from David A. Moon
Message-ID: <19890128031334.4.MOON@EUPHRATES.SCRC.Symbolics.COM>

Please acknowledge receipt of this mail so I can be sure it was
not lost in the network.  The reply needn't be CC'ed to any of
the other recipients.  So far I have no evidence that anyone
other than Masinter has received it.

Date: Tue, 24 Jan 89 14:46 EST
From: David A. Moon <Moon@STONY-BROOK.SCRC.Symbolics.COM>
Subject: Comments on the Character proposal dated January 1, 1989
To: Thom Linden <Baggins@IBM.COM>
cc: CL-Characters@SAIL.STANFORD.EDU, X3J13@SAIL.STANFORD.EDU, Common-Lisp-Implementors@STONY-BROOK.SCRC.Symbolics.COM,
    KMP@STONY-BROOK.SCRC.Symbolics.COM, Palter@STONY-BROOK.SCRC.Symbolics.COM

Please acknowledge receipt of this mail so I can be sure it was
not lost in the network.  The reply needn't be CC'ed to any of
the other recipients.

Page 6 -- *all-registry-names* should be renamed to
*all-character-registry-names*; the word "registry" by itself
is too general.

Page 9 -- the fourth bullet requires a defined total ordering of all
characters.  This seems unnecessary, and is impossible to implement in any
system (such as Symbolics Genera) that allows dynamic addition of character
registries by third-party software vendors and by users; in such a system
character codes have to be allocated dynamically and therefore their order
cannot be fixed ahead of time.

Page 9 -- This says an implementation must define the result of
standard-char-p on the characters it supports.  I think that is incorrect.
Common Lisp fully defines the result of standard-char-p, which is NIL
for all characters added by an implementation.

Page 14 -- This EXTERNAL-WIDTH function probably should be part of a
database facility or a terminal screen template facility; I'm not sure it
is useful by itself.  Also note that its result is only meaningful with
respect to a specific state of the stream.  To give two examples, with the
SO/SI encoding the answer can vary by 1 depending on whether the stream is
already shifted into the correct state for the first character; with the
universal encoding Symbolics uses, the answer can vary by a lot depending on
whether the character repertoires appearing in the string have been used
earlier on the same stream (and hence have been assigned encoding numbers).
Because of this dependence on the state of the stream, I cannot think of
any correct use of EXTERNAL-WIDTH that does not involve immediately
outputting the string to the stream.  Therefore I believe the same effect
can be achieved without adding any new functions, by calling FILE-POSITION,
outputting to the stream, calling FILE-POSITION again, and subtracting.  If
you still want to propose this feature, you should change the name: use
"length" instead of "width", since that's the word Common Lisp always uses,
and use a name that relates to the :EXTERNAL-CODE-FORMAT option to OPEN;
for example, STRING-LENGTH-IN-EXTERNAL-CODE-FORMAT or
EXTERNAL-CODED-STRING-LENGTH.

Page 24 -- I can't figure out what you intend the meaning of SIMPLE-STRING
to be.  Your report mostly does not mention it, but it doesn't say to
remove it either.  If I have correctly correlated page 24 back to CLtL, you
are defining SIMPLE-STRING to be synonymous with SIMPLE-GENERAL-STRING.
Maybe what you really meant, though, was what you said in November you
would do, which was to make SIMPLE-STRING mean (AND STRING SIMPLE-ARRAY),
in other words a union of several subtypes.  This is particular confusing
because Common Lisp uses the name SIMPLE-VECTOR to mean what you might call
a simple general vector, that is, (SIMPLE-ARRAY T 1) rather than
(SIMPLE-ARRAY * 1).  Here are my suggestions for what to do with the
various names for string subtypes:

  STRING                  As a union of all strings, this is fine.
  GENERAL-STRING          I think (VECTOR CHARACTER) is just as good.
  BASE-STRING             I think (VECTOR BASE-CHARACTER) is just as good.
  SIMPLE-STRING           Should mean (SIMPLE-ARRAY CHARACTER 1).
  SIMPLE-BASE-STRING      This is fine.
  SIMPLE-GENERAL-STRING   This name is horrible, use SIMPLE-STRING.

My rationale for these suggestions largely comes from thinking about
which of these names would ever be used in type declarations and about
how these names relate to the other names already in Common Lisp.  To
repeat older comments:

  Pages 19 and 20 introduce a new type named simple-base-string, in addition
  to simple-string.  If you think about how simple-string would be used for
  compiler optimization, it makes sense for simple-string to be the name for
  the single simplest representation, rather than a name for a whole family
  of representations that would have to be discriminated at run time.  Thus
  what you call simple-base-string should be called simple-string, and what
  you call simple-string should just be called (simple-array character (*)).
  This would not be an incompatible change in the meaning of simple-string.
  Simple-string would be analogous to simple-vector.
          
I changed my mind slightly on that and now claim that while SIMPLE-STRING
should still be a single representation, not a union, it should be the
representation that can hold all characters.  This is both because of the
principle that correct programs should be easier to write than
extra-efficient programs, and because of the powerful analogy with the name
SIMPLE-VECTOR.  Then the name SIMPLE-BASE-STRING is also needed for
convenient type declarations of the more efficient but less functional
string representation.  That name is good, by analogy to BASE-CHARACTER.

Adopting the above suggestions helps you decide what to do about the
SCHAR, SBCHAR, and SGCHAR mess.  First of all, you only need two functions,
not three, because there are only two specified specialized representations.
SCHAR should be for what I've called SIMPLE-STRING, SBCHAR should be
for SIMPLE-BASE-STRING, and SGCHAR is not needed.  (In fact I would prefer
to remove all of the specialized versions of AREF from the language, in
favor of THE or type declarations, but I know that would only pass over
some peoples' dead bodies so I won't push it.)

In case you are wondering, I have no quarrel with the name BASE-CHARACTER
and would not want to see it removed.  I guess I differ from Larry here,
unless I erred when I wrote down his comments during the meeting.

Page 25 -- The discussion of STRING and SIMPLE-STRING thinks that there
is a distinction between declaration and discrimination, but Common Lisp
no longer has such a distinction.  Even when Common Lisp did have such
a distinction, the meanings for declaration stated here were incorrect.

Page 29 -- *all-character-registry-names* has to be a variable, not a
constant, to accomodate systems (such as Symbolics Genera) that allows
dynamic addition of character registries by third-party software vendors
and by users.

Page 35 -- CHAR-REGISTRY should be renamed to CHAR-REGISTRY-NAME, so that
if at some later time character registry objects are added, there is no
possibility of confusion about whether this function returns a name or
an object.

Page 40 -- the default :ELEMENT-TYPE for OPEN cannot be BASE-CHARACTER.  I
think this was discussed at the X3J13 meeting.  The report suffers from a
confusion between two meanings of BASE-CHARACTER: the character type
implemented most efficiently by the Lisp, and the character type most
natural to the file system.  These are not always the same.  Furthermore,
in a network-based system that supports multiple file systems equally
(Symbolics Genera is an example), each file system might have a different
natural character type.  BASE-CHARACTER should just mean the character type
implemented most efficiently by the Lisp.  The default for :ELEMENT-TYPE
has two viable choices that I can see, and maybe you should just propose
both and let people vote:

  (1) CHARACTER.  This matches the behavior of MAKE-STRING and friends,
  adheres to the principle that writing correct programs should be easier
  than writing extra-efficient programs (since making a program correct
  requires making every part of it correct, while making a program
  efficient only requires improving the bottlenecks), and doesn't cost
  anything in implementations that don't have extended characters.

  (2) The most natural type for the particular pathname being opened.
  In some systems this would be a constant, and in a subset of those
  systems this would be BASE-CHARACTER, however in general this might
  depend on the host, device, or even type fields of the pathname,
  and might also depend on information stored in the file system.
  In general this would always be an (improper) supertype of
  BASE-CHARACTER, but it's probably a bad idea to make that a requirement,
  as some file systems might not be able to implement it conveniently.
  Again this doesn't cost anything in implementations that don't have
  extended characters.

The relationship of option 2 to :ELEMENT-TYPE :DEFAULT (a feature that
already exists in Common Lisp) needs to be clarified.  Perhaps they
are the same.

Also the following promise from 14 November did not show up in the report:

  >>     There should be a name for the "natural" encoding and there should be a
  >>     specification of the properties of the natural encoding that a programmer
  >>     can rely on.  Suggestions for the name include :BASE, :NATURAL, and
  >>     :INTERCHANGE.  The definition probably involves the concept of data
  >>     interchange with non-Lisp programs on the same system.
  
  This will be added to the revision.

Appendix B -- I disagree with the way you've used deprecation.  I'll 
comment on each individual point:
 - I see no justification for deprecating STANDARD-CHAR.
 - I agree that STRING-CHAR should be deprecated, not deleted nor kept.
 - I think fonts and bits should be removed outright, not deprecated,
   because no portable program could possibly be using them.
 - I think the CHAR-INT function needs to be kept, although the INT-CHAR
   function should go away.  This is for hashing.  See comments below
   on character attributes.

No particular page -- the use of strings for naming registries, labelling
characters, and naming external code formats is objectionable.  Nothing
else in Common Lisp is named by strings.  Use of strings might lead to
efficiency problems.  We feel that keyword symbols are the appropriate
objects to use for these three kinds of names.

No particular page -- We agree with the deprecation or deletion of the two
particular character attributes defined by CLtL, but not with the
deprecation of the whole concept of character attributes.  In fact on page
20 you say "characters are uniquely distinguished by their codes," which
makes it impossible to have character attributes at all.  The language must
define how conforming programs should be written so that they will work
both in implementations with character attributes and in implementations
without them.  For example, the value of (eql x (code-char (char-code x)))
is unspecified.  Another thing that needs to be said is that the exact
character operations (char=, string=, etc.) respect all character
attributes, while the inexact character operations (char-equal,
string-equal, etc.) respect or ignore each character attribute in an
implementation-defined but consistent fashion.  Some of what you say on
page 44 about attributes in general needs to be part of the spec, not
deprecated.  I would retain everything on that page except for INT-CHAR and
the last bullet (referring to bits and fonts), and I would add a remark
that FIND-SYMBOL and INTERN respect character attributes.  If you want,
perhaps I or someone else at Symbolics can provide exact text for what
to say about character attributes that you could insert into your report.

No particular page -- On the subject of defining character registries in a
separate document, and relating them to ISO standards for character
encoding: I think that's fine.  I don't see anything wrong with introducing
the concept of character registry and the requirement that each character
object relates to exactly one registry.  However, I think the somewhat
random list of character registries on pages 7-8 and again on page 21 does
not belong in the language specification.  Even the names of the
standardized character registries belong in the character registry
standard, not in the Common Lisp language standard.  I'm confused about the
meaning of BASE, STANDARD, and CONTROL as character registry names; these
are mentioned in your report but not explained very well.  If these are
character registries that are required to exist in all Common Lisp
implementations, then unlike the others they do belong in the Common Lisp
language standard, not in the character registry standard.

At the meeting there was some discussion about the issue of enumerating all
characters in a character registry.  People claimed incorrectly that it was
impossible.  In fact it's possible to do this, with questionable
efficiency, by the following program:

  (dotimes (code char-code-limit)
    (let ((char (code-char code)))
      (when char
        (when (eq (char-registry-name char) desired-registry-name)
          ... process this char ...))))

Of course you have to change the EQ to EQUALP if you continue to use
strings to name character registries.  For more efficiency, you could add
a way to iterate over all the codes in one character registry, but I think
that is unnecessary.


TYPOS:

25 -- base-string is missing from the Table 4-1 amendment.

26 -- general-string is not an array of BASE characters, also the first
two paragraphs under A.4.8 are garbled (the two separate sentences for
strings for symbols got smushed together).

37 -- This says the default for the :ELEMENT-TYPE option to MAKE-STRING
is SIMPLE-STRING.  Actually it's CHARACTER.


VOTING:

You asked for suggestions on how to modularize the voting.  Here is a
possible breakdown into separate issues, admittedly not very well thought
out.  In general, I feel that breaking down the voting into separate issues
is a good idea even if some of the issues are interdependent.  If people
don't understand the interdependencies well enough to vote properly, then I
would claim that the subcommittee report hasn't explained things well
enough.

Concept of character registries and character labels
Functions and variables for character registries and character labels
Page 9 rules for implementation-defined character registries
New syntax for #\
New syntax for the CHARACTER type specifier, new argument to CHARACTERP
New rules for names of symbols
Deprecation of STRING-CHAR
Deprecation of STANDARD-CHAR
BASE-CHARACTER type
New meaning of STRING type, and its subtypes
SCHAR
SBCHAR
SGCHAR
Type returned by (CONCATENATE 'STRING ...) and similar forms
Extensions to COERCE
EXTERNAL-WIDTH function
:ELEMENT-TYPE option to OPEN
:EXTERNAL-CODE-FORMAT option to OPEN
CHAR-CODED-CHARACTER-SET-VALUE function
Rules for implementation-defined character attributes
Removal or deprecation of bits and fonts, and implied arglist changes
Removal or deprecation of the semi-standard format effector characters
Support of extended characters in the readtable
Removal or deprecation of CHAR-INT
Removal or deprecation of INT-CHAR
Miscellaneous other and editorial changes

Wow, 26 ballot items!  Democracy on the march!  I think the complexity of
the issues amply justifies having about that many separate issues, though.

∂31-Jan-89  0132	CL-Characters-mailer 	Comments on the Character proposal dated January 1, 1989  
Received: from lucid.com by SAIL.Stanford.EDU with TCP; 31 Jan 89  01:32:31 PST
Received: from bhopal ([192.9.200.13]) by heavens-gate.lucid.com id AA03964g; Tue, 31 Jan 89 01:27:03 PST
Received: by bhopal id AA14594g; Tue, 31 Jan 89 01:23:28 PST
Date: Tue, 31 Jan 89 01:23:28 PST
From: Jon L White <jonl@lucid.com>
Message-Id: <8901310923.AA14594@bhopal>
To: Moon@STONY-BROOK.SCRC.Symbolics.COM
Cc: Baggins@IBM.COM, CL-Characters@SAIL.STANFORD.EDU, X3J13@SAIL.STANFORD.EDU,
        Common-Lisp-Implementors@STONY-BROOK.SCRC.Symbolics.COM,
        KMP@STONY-BROOK.SCRC.Symbolics.COM,
        Palter@STONY-BROOK.SCRC.Symbolics.COM
In-Reply-To: David A. Moon's message of Tue, 24 Jan 89 14:46 EST <19890124194625.1.MOON@EUPHRATES.SCRC.Symbolics.COM>
Subject: Comments on the Character proposal dated January 1, 1989

re: 

  Date: Tue, 24 Jan 89 14:46 EST
  From: David A. Moon <Moon@STONY-BROOK.SCRC.Symbolics.COM>
  Subject: Comments on the Character proposal dated January 1, 1989

  . . .   what to do about the
  SCHAR, SBCHAR, and SGCHAR mess.  First of all, you only need two functions,
  not three, because there are only two specified specialized representations.
  SCHAR should be for what I've called SIMPLE-STRING, SBCHAR should be
  for SIMPLE-BASE-STRING, and SGCHAR is not needed.  (In fact I would prefer
  to remove all of the specialized versions of AREF from the language, in
  favor of THE or type declarations, but I know that would only pass over
  some peoples' dead bodies so I won't push it.)

  . . .

You might be surprised at the kinds of people who really don't care
about whether or not names for the specialized versions of AREF are 
defined in the portable languages.  Me, for one.

But what I am very leary about is a definition of an important data 
type that is too wishy-washy to be portable.  SIMPLE-BASE-STRING may be
in this category.  By way of explanation, let me draw the parallel with
some numeric types.

Ostensibly, FIXNUM is in this non-portable category, since up until the 
Hawaii meeting, there wasn't even any requirement in the language the 
type FIXNUM mean anything at all (i.e., it could be the null subset of 
INTEGER).  At least now (TYPEP x 'FIXNUM) will imply that <x> is useful 
for array/vector indices.  For example, the following very reasonable
program might work in implementation A, but fail in implementation B,
*** merely because of the variance of the FIXNUM datatype between the
two implementations ***:

    (defun fills-out-p (i b)
      ;; Ascertain whether bit 'i' of the bit vector 'b' is the same as
      ;;  all the remaining bits (in the higher bit positions).
      (check-type i fixnum)
      (check-type b simple-bit-vector)
      (locally (declare (fixnum i) (simple-bit-vector b))
         ;; Declarations for "fast, open-coding"
       (or (>= i (length b))
           (null (position (logxor 1 (aref b i))	   ;bit 'i', "inverted"
			    b 
                            :start (the fixnum (1+ i)))))))

    (setq i (position 1 <long-bit-vector>)) 		==> say, 100000

    (fills-out-p <long-bit-vector> i)

Given our resolution passed in Hawaii about "tightening" the definition
of the type FIXNUM, the above program is now fully portable.  True,
there are still some areas of the type FIXNUM that aren't portable;
but the situation isn't nearly as bad as it was before.  

Contrast this with the situation regarding the type FLOAT.  Although
there are many aspects of non-portability regarding the _use_ of
floating point numbers, there is no permitted variance in the definition
of the type FLOAT.  It is never permissible, for example, for one
implementation to implement the FLOAT datatype as lists of integers,
and another to implement it as some low-level primitive datatype.
Thus if a user's "declaration" (CHECK-TYPE X FLOAT) fails in one
implementation, but works in another, it is not due to an inherent
weakness in the specification of the type FLOAT.

As I pointed out during the Hawaii meeting, the types SIMPLE-BASE-STRING
and SIMPLE-GENERAL-STRING contain all the seeds of disaster that the
original definition of FIXNUM did.  I would almost rather not see them 
put into the language at all if there isn't some minimal statement of 
portability about them.  The question is:  "Is it worth it to make these 
types be part of the portable language?".  The answer to that will depend 
on whether or not there are serious contenders for "optimization".  It
is my current belief that Lucid's optimization strategies for STRINGs are
_not_ particularly dependent on having a distinction between SIMPLE-STRING
and SIMPLE-BASE-STRING.  I.e., we considered the problem some time ago, 
and decided that we could "swallow" a union type, if necessary, for
SIMPLE-STRING, given that the union was only concerning one of two
primitive element types.

So, now, the question is:  What implememtations -- current, or seriously
conceived -- would benefit greatly from a portable definition of the
type SIMPLE-BASE-STRING?  Let them come forward now, or for the next
few years hold their peace.



-- JonL --

∂31-Jan-89  1357	CL-Characters-mailer 	Comments on the Character proposal dated January 1, 1989  
Received: from STONY-BROOK.SCRC.Symbolics.COM (SCRC-STONY-BROOK.ARPA) by SAIL.Stanford.EDU with TCP; 31 Jan 89  13:57:06 PST
Received: from EUPHRATES.SCRC.Symbolics.COM by STONY-BROOK.SCRC.Symbolics.COM via CHAOS with CHAOS-MAIL id 531105; Tue 31-Jan-89 16:53:00 EST
Date: Tue, 31 Jan 89 16:53 EST
From: David A. Moon <Moon@STONY-BROOK.SCRC.Symbolics.COM>
Subject: Comments on the Character proposal dated January 1, 1989
To: Jon L White <jonl@lucid.com>
cc: Baggins@IBM.COM, CL-Characters@SAIL.STANFORD.EDU, X3J13@SAIL.STANFORD.EDU,
    Common-Lisp-Implementors@STONY-BROOK.SCRC.Symbolics.COM, KMP@STONY-BROOK.SCRC.Symbolics.COM,
    Palter@STONY-BROOK.SCRC.Symbolics.COM
In-Reply-To: <8901310923.AA14594@bhopal>
Message-ID: <19890131215325.7.MOON@EUPHRATES.SCRC.Symbolics.COM>

Maybe I shouldn't reply to this, since the content doesn't seem to be
directed to me, but on the other hand I was the only non-CC recipient.

    Date: Tue, 31 Jan 89 01:23:28 PST
    From: Jon L White <jonl@lucid.com>
    ....
    But what I am very leary about is a definition of an important data 
    type that is too wishy-washy to be portable.  SIMPLE-BASE-STRING may be
    in this category....

    As I pointed out during the Hawaii meeting, the types SIMPLE-BASE-STRING
    and SIMPLE-GENERAL-STRING contain all the seeds of disaster that the
    original definition of FIXNUM did.

I'm afraid I don't understand your comment.  What's inadequately specified
about SIMPLE-BASE-STRING?  Is the problem with the SIMPLE part or with
the BASE part, i.e. are you complaining that SIMPLE arrays aren't well
enough specified, or that the BASE-CHARACTER element-type isn't well
enough specified, or something else?

    It
    is my current belief that Lucid's optimization strategies for STRINGs are
    _not_ particularly dependent on having a distinction between SIMPLE-STRING
    and SIMPLE-BASE-STRING.  I.e., we considered the problem some time ago, 
    and decided that we could "swallow" a union type, if necessary, for
    SIMPLE-STRING, given that the union was only concerning one of two
    primitive element types.

I find this rather surprising, although it's really not my place to question
it.  You really don't generate just a move.b instruction followed by some kind
of instruction to add in the type bits, on 68000's, for aref of a simple
1-dimensional array of base characters, at highest speed optimization level
and lowest safety optimization level?  Instead you generate some kind of type
check followed by branches to either a move.b or a move.l depending on the
string's element type?

Answering the first part is much more important than answering the second
part, in fact maybe I didn't really care about the second part anyway.

∂02-Feb-89  1308	CL-Characters-mailer 	Really about TYPEP failures: Comments on the Character proposal dated January 1, 1989   
Received: from lucid.com by SAIL.Stanford.EDU with TCP; 2 Feb 89  13:08:08 PST
Received: from bhopal ([192.9.200.13]) by heavens-gate.lucid.com id AA03280g; Thu, 2 Feb 89 13:02:19 PST
Received: by bhopal id AA29703g; Thu, 2 Feb 89 13:04:21 PST
Date: Thu, 2 Feb 89 13:04:21 PST
From: Jon L White <jonl@lucid.com>
Message-Id: <8902022104.AA29703@bhopal>
To: Moon@STONY-BROOK.SCRC.Symbolics.COM
Cc: Baggins@IBM.COM, CL-Characters@SAIL.STANFORD.EDU, X3J13@SAIL.STANFORD.EDU,
        Common-Lisp-Implementors@STONY-BROOK.SCRC.Symbolics.COM,
        KMP@STONY-BROOK.SCRC.Symbolics.COM,
        Palter@STONY-BROOK.SCRC.Symbolics.COM
In-Reply-To: David A. Moon's message of Tue, 31 Jan 89 16:53 EST <19890131215325.7.MOON@EUPHRATES.SCRC.Symbolics.COM>
Subject: Really about TYPEP failures: Comments on the Character proposal dated January 1, 1989

re: I'm afraid I don't understand your comment.  What's inadequately 
    specified about SIMPLE-BASE-STRING?  

Sorry, I wasn't clear enough here.  The phrase used was "wishy-washy", 
and it didn't mean that the specification was unclear or something; 
rather, it called in question the enforcement of the specification.  As 
I understand it, it's entirely permissible for one implementation to
merge SIMPLE-BASE-STRING and SIMPLE-GENERAL-STRING into one type,
whereas another may make them type-disjoint.

This is a problem not specific to SIMPLE-BASE-STRING etc, but also applies
to any "wishy-washyness" about the disjointedness of types, where
there is good reason to believe that some implementations will truly
utilize that disjointnedness.  Recall my comment about the situation 
with the FLOAT datatype:

    Contrast this with the situation regarding the type FLOAT.  Although
    there are many aspects of non-portability regarding the _use_ of
    floating point numbers, there is no permitted variance in the definition
    of the type FLOAT.  It is never permissible, for example, for one
    implementation to implement the FLOAT datatype as lists of integers,
    and another to implement it as some low-level primitive datatype.
    Thus if a user's "declaration" (CHECK-TYPE X FLOAT) fails in one
    implementation, but works in another, it is not due to an inherent
    weakness in the specification of the type FLOAT.

Suppose for the moment that one implementation merges the FLOAT and CONS 
datatypes by implementing FLOATs as a list of three integers (such as the 
the values returned by integer-decode-float), but another implementation 
makes them disjoint as "primitive" types.  At first blush, one might want 
to dismiss this case as simply an "efficiency" concern for the second 
implementation.  But consider the problems for someone developing the 
following program on the first implementation and delivering it on the 
second:

     (defun foo (x) 
       (if (typep x 'float)
           (sin x)
           (error "Must have a float")))

     (foo (list 1 0 1))                  ;; knowing that (float 1.0) is ...

This is a valid program in the first implementation, because the list
(1 0 1) is a valid FLOAT in that implementation.  But when moving it to 
the second implementation, you get a bug.  Now, implementing FLOATs as 
LISTs may look artificial, but this example exactly parallels one of the 
more annoying porting problems that some 3600 users have when going into
virtually any of the stock hardare implementations.  See footnote below.


Recall the recent cleanup proposal to "tighten up" the definition of
   FIXNUM so that its use will more frequently be portable.

Recall also the recent cleanup issue to tighten the FUNCTION type so
   that it is no longer ambiguous with SYMBOL.

Recall also the CLOS need to tighten up the disjointedness of the types
   found on CLtL p.31 (along with others too); 88-002R, p.1-17.

Recall the trouble we've had with the ambiguity in the alternatives 
   for SHORT-FLOAT, SINGLE-FLOAT, DOUBLE-FLOAT, and LONG-FLOAT.


Given this long history of misguided permissiveness, and our frequent
need to "tighten things up", don't you think it would a mistake to start 
out with the types SIMPLE-BASE-STRING and SIMPLE-GENERAL-STRING ambiguous?


You also asked about how Lucid might implement open-coding of SGCHAR and 
SBCHAR.  This is immaterial.  I only meant to say that *if* the character
proposal were to require (or suggest) that SIMPLE-STRING be a union of
two primitive types, we could "swallow" that (possible a little gulping
along the way).  Dave Unietis of Lucid, who has been participating in the 
Character subcommittees deliberations, will probably make a more detailed 
commentary soon as to what we expect of the Character Proposal.



-- JonL --


Footnote:

Actually, the FLOATs-as-LISTs example is of more than passing interest,
because if you change the names appropriately, you will see exactly
the same problem that certain users have when porting code from the 3600
implementation to *any* stock hardware implementation that does serious
open-coding of AREF.  Note that the only function I've mentioned is
TYPEP -- not hokey-pokey-adjust-array or whatever; that's why all the
thousands of lines of discussion about adjust-array etc on cl-cleanup are 
not germane to the real problem.  Focusing on adjust-array can, at best, 
emasculate some of its mandated error checking; at worst, it can confuse 
some would-be heros into thinking that the type disjointedness in question
is merely a matter of semantics that can cleared up by clever wording.  
Make no mistake.  The disjointedness is essential to the optimization 
strategies of *all* the commercial stock hardware implementations
represented at the Hawaii meeting.  

∂07-Feb-89  1305	CL-Characters-mailer 	characters proposal    
Received: from ti.com by SAIL.Stanford.EDU with TCP; 7 Feb 89  13:05:09 PST
Received: by ti.com id AA27344; Tue, 7 Feb 89 15:04:05 CST
Received: from Kelvin by tilde id AA17771; Tue, 7 Feb 89 14:55:25 CST
Message-Id: <2811876853-716042@Kelvin>
Sender: GRAY@Kelvin.csc.ti.com
Date: Tue, 7 Feb 89  14:54:13 CST
From: David N Gray <Gray@DSG.csc.ti.com>
To: CL-Characters@SAIL.Stanford.edu
Cc: Bartley@MIPS.csc.ti.com, Waldrum@Tilde.csc.ti.com
Subject: characters proposal

I have read the documented titled "Extensions to Common LISP to Support
International Character Sets" dated January 1, 1989, and feel that it is
not much of an improvement over what we saw in October.  Following are
some random comments about things I happened to notice; this is not
intended to be a comprehensive analysis.

First, documents such as this ought to be labelled with an X3J13
document number so that they can be referred to conveniently and
unambiguously.

"Appendix A" and "Appendix B" really should be chapters 3 and 4 since
they are an essential part of the proposal, rather than being an
appendage to it.

Page 7 says that the definition of semi-standard-characters "is replaced
by a more uniform approach with introduction of the Control Character
Registry".  Do you really mean that it _will_be_ replaced when the
Control Character Registry is defined in some subsequent document?  I
certainly don't see anything in this document that could be considered a
replacement.

This whole concept of registries seems rather strange.  Is the intent
that the alphabetic characters of the standard characters are to be in
the "Latin" registry while characters such as period and comma are in
"Latin-Punctuation"?   Is #\NEWLINE in the "Control" registry?  Where do
the digits go -- "Mathematical"?.  Is #\- a "Latin-Punctuation" or a
"Mathematical"?  Which registry is #\SPACE in?  Now tell me what to do
with the extra non-Latin alphabetic characters used in Sweedish?  Does
that require a separate registry for just those additional characters?
Now we have simple text in a single language using characters from at
least four different registries.  Do you really think it possible to
agree on a "fixed", non-extensible, set of "Mathematical" or "Pattern"
characters?

Page 9 says that an implementation needs to specify the total ordering
of characters within each registry, but what about the ordering of
characters in different registries?  Is that completely undefined?

Page 25 section A.4.5 doesn't specify the syntax of a registry name; did
you intend it to be a string?

Page 27 has an example using  (typep x '(character "standard"))  but
page 25 said that had to be a registry name; "standard" is not a
registry name.

Page 29 - *ALL-REGISTER-NAMES* -- a list of strings?

Page 33 -- FIND-CHAR -- does the index value within a registry have any
portable meaning?  Is that intended to be specified for the standard
registries?  Is "base" supposed to be accepted here?  If not, how can
you access the base codes?  If I were going to construct a character
from its index value, it would be more meaningful to use an index
relative to some coded character set rather than these registries.

Page 36, the last sentence doesn't make sense.  The default for
:ELEMENT-TYPE would have to be either CHARACTER or BASE-CHARACTER.

Page 37, section A.22.1.1 -- the part being deleted specifies the
meaning of including tab and form-feed characters in a Common Lisp
source file; do you really intend that to not have any standard meaning?
If my editor uses tabs for indenting, does that mean that the resulting
source file is not a standard-conforming program?

Page 38, the first reference to p360 of CLtL should be p353; the
deletion here says that there shall not be any standard name for the
commonly used control characters such as tab and form-feed.  That still
seems wrong to me.

Page 41, what's the point of appending "ccs" to the name of the
standard?  Presumably that stands for "coded character set", but isn't
that adequately implied by the fact that this string will follow the
keyword :EXTERNAL-CODE-FORMAT ?   The use of "default" seems odd since
:DEFAULT is used everywhere else.  

I agree with Moon that the excising of bits and fonts has not been done
carefully enough for them to be compatible extensions.


P.S.  If you are still having trouble replying to me, this address
should work:  GRAY%DSG.CSC.TI.COM@RELAY.CS.NET

∂23-Feb-89  1014	CL-Characters-mailer 	subcommittee meeting at fairfax  
Received: from IBM.COM by SAIL.Stanford.EDU with TCP; 23 Feb 89  10:14:03 PST
Date: Thu, 23 Feb 89 09:39:02 PST
From: Thom Linden <baggins@IBM.com>
To: "X3J13: Character Subcommittee" <cl-characters@sail.stanford.edu>
Message-ID: <890223.093902.baggins@almvma>
Subject: subcommittee meeting at fairfax

I won't be in DC until Tuesday afternoon.  I've asked for the
character proposal discussion for Wednesday (4 hrs).  We could
get together at the Marriott Tuesday evening?

Regards,
  Thom

∂27-Feb-89  1855	CL-Characters-mailer 	characters committee   
Received: from IBM.COM by SAIL.Stanford.EDU with TCP; 27 Feb 89  18:55:10 PST
Date: Mon, 27 Feb 89 11:57:20 PST
From: Thom Linden <baggins@IBM.com>
To: "X3J13: Character Subcommittee" <cl-characters@sail.stanford.edu>
Message-ID: <890227.115720.baggins@almvma>
Subject: characters committee

At this point, I consider the work of the characters committee coming
to completion with the March meeting.  As long as there is
no work item lingering after the meeting (and that is my intent),
we have no additional subcommittee work regarding characters handling
for the 1989 draft CL standard.
My thanks to all of you for your participation, comments and
critiques.

  Thom

∂28-Feb-89  1949	CL-Characters-mailer 	Re: cs proposal comments    
Received: from ti.com by SAIL.Stanford.EDU with TCP; 28 Feb 89  19:49:40 PST
Received: by ti.com id AA00427; Tue, 28 Feb 89 21:48:26 CST
Received: from Kelvin by tilde id AA21153; Tue, 28 Feb 89 21:36:41 CST
Message-Id: <2813715354-5499176@Kelvin>
Sender: GRAY@Kelvin.csc.ti.com
Date: Tue, 28 Feb 89  21:35:54 CST
From: David N Gray <Gray@DSG.csc.ti.com>
To: Thom Linden <baggins@IBM.COM>
Cc: CL-Characters@SAIL.Stanford.edu, Bartley@MIPS.csc.ti.com,
        Waldrum@Tilde.csc.ti.com
Subject: Re: cs proposal comments
In-Reply-To: Msg of Wed, 22 Feb 89 04:51:15 PST from Thom Linden <baggins@IBM.COM>

> >>   This whole concept of registries seems rather strange.  Is the intent
> >>   that the alphabetic characters of the standard characters are to be in
> >>   the "Latin" registry while characters such as period and comma are in
> >>   "Latin-Punctuation"?   Is #\NEWLINE in the "Control" registry?  Where do
> >>   the digits go -- "Mathematical"?.  Is #\- a "Latin-Punctuation" or a
> >>   "Mathematical"?  Which registry is #\SPACE in?  Now tell me what to do
> >>   with the extra non-Latin alphabetic characters used in Sweedish?  Does
> >>   that require a separate registry for just those additional characters?
> >>   Now we have simple text in a single language using characters from at
> >>   least four different registries.  Do you really think it possible to
> >>   agree on a "fixed", non-extensible, set of "Mathematical" or "Pattern"
> >>   characters?
> 
>   Actually, I believe the simplicity of the registry framework will make
> agreement easy.  Currently, members of the coded character set
> committees spend vast amounts of time lobbying for inclusion of their
> favorite character(s) in the 'popular' coded character set standard.
> The effect of not being included means fewer installations will
> support their native language properly.

You mean that since the registry just defines names and doesn't require
all the characters to be implemented, that no hard choices need to be made
about what to leave out?  OK.

>   I think a new group, hopefully formed within
> programming languages, should define the registries rather than
> the existing coded character set committees.  There is no competition
> between registries, ie. no advantage of one over another.  What this
> committee has to agree upon is 1) a useful set of registry names and
> 2) definition of the constituents of each registry.  The only argument
> I would anticipate is "are the semantics of my alpha the same
> or different from your alpha" type debates.

But if we decide that "my alpha" _is_ the same as "your alpha", then which
of our languages' registry gets to include it?  I can see a lot of
confusion over characters that are used the same in more than one
language.

I'm not so much concerned with who decides this as with whether this
approach is even feasible.  The lack of even a "for example" scenario of
how this might work leaves me with a lot of doubts about whether it _can_
work.  Also, it is not apparent why anyone outside the Common Lisp
community would have any interest in participating in such a standards
effort.

>   By the way,
> the registries are fixed only in that a Common LISP implementation
> cannot modify the standard definitions.  This guarantees an application
> program can portably rely on the composition and decomposition
> functions to establish the availability of any given character.

Page 7 of the draft dated February 21 says that 

  "The proposed ISO Character Registry Standard is fixed; an 
   implementation may not extend a standard registry's constituent 
   set of characters beyond the standard definition."

That says to me that if an implementation is going to add a character,
then it can only be added to an implementation-defined registry.  What
happens then if a new edition of the registry standard includes that
character in one of the standard registries?  Since a character is not
permitted to be a member of more than one registry, that immediately
becomes an incompatible change for anyone who has been using that
character.  Consequently, even extensions by standards revision will be
discouraged.  That seems quite non-extensible to me.

> >>   Page 9 says that an implementation needs to specify the total ordering
> >>   of characters within each registry, but what about the ordering of
> >>   characters in different registries?  Is that completely undefined?
> 
> There is no ordering of characters within registries.  As mentioned
> in Hawaii, the character index (a number) was changed to character
> label (a symbol) throughout the proposal.

So CHAR< etc. have no portable meaning unless both arguments are standard
characters in one of the partial ordering groups on page 237 of CLtL?
If you are going to have a Greek alphabet that is required to be disjoint
from the Latin alphabet anyway, wouldn't you want to know that the Greek
letters could be sorted in the expected order?

> >>   Page 25 section A.4.5 doesn't specify the syntax of a registry name; did
> >>   you intend it to be a string?
> 
> These have been changed to be symbols.

Fine, but it appears that the new draft still doesn't say that.

> >>   Page 29 - *ALL-REGISTER-NAMES* -- a list of strings?
> 
> Now a list of symbols.

Again, the document doesn't say that.

> >>   Page 33 -- FIND-CHAR -- does the index value within a registry have any
> >>   portable meaning?  Is that intended to be specified for the standard
> >>   registries?  Is "base" supposed to be accepted here?  If not, how can
> >>   you access the base codes?  If I were going to construct a character
> >>   from its index value, it would be more meaningful to use an index
> >>   relative to some coded character set rather than these registries.
> 
> FIND-CHAR takes a character label and registry.  These are specified
> by the registry standard.  Base is not a registry name.  We have
> introduced a new function CHAR-CCS-VALUE which takes a character
> object and a coded character set name (a symbol) and returns the
> encoding of the character in the coded character set.

That sounds good, but don't we also need the inverse function, to
construct a character object given a CCS and index?

> >>   Page 37, section A.22.1.1 -- the part being deleted specifies the
> >>   meaning of including tab and form-feed characters in a Common Lisp
> >>   source file; do you really intend that to not have any standard meaning?
> >>   If my editor uses tabs for indenting, does that mean that the resulting
> >>   source file is not a standard-conforming program?
> 
> That really depends on the definition of a conforming program. Is
> this defined yet?

Never mind that; the real question is why do you want the standard to not
specify the meaning of tabs and form-feeds in source files?

---

This document is too big for me to have time to read the whole thing right
now [doesn't TeX support change bars?], but here are a few more comments
from quickly glancing through the February 21st draft:

In the character table on page 17, do the "graphic labels" have any
significance?  I don't see that the document uses them or requires them to
be used in any way.  If not, that column should be deleted.  I hope that
this is _not_ an example of what character names in a registry would look
like.

Your message of January 24 said you were going to:

  -- modify char-name, name-char, and #\name  to accept character
       names of the form 'registry:label'

but I can't see that this draft does that.  In particular, it is not at
all apparent how this is supposed to affect CHAR-NAME.  Should I expect
(CHAR-NAME #\NEWLINE) to return "NEWLINE" or something like
"CONTROL:NEWLINE"?  Are #\SPACE and #\NEWLINE to be the only characters
that can be referenced by a name that does not included a registry prefix?
Since all characters will have a label in some registry, does that mean
that CHAR-NAME will never return NIL anymore?

∂28-Feb-89  2020	CL-Characters-mailer 	Re: cs proposal straw vote  
Received: from ti.com by SAIL.Stanford.EDU with TCP; 28 Feb 89  20:19:46 PST
Received: by ti.com id AA00500; Tue, 28 Feb 89 22:18:18 CST
Received: from Kelvin by tilde id AA21544; Tue, 28 Feb 89 22:08:43 CST
Message-Id: <2813717283-5615057@Kelvin>
Sender: GRAY@Kelvin.csc.ti.com
Date: Tue, 28 Feb 89  22:08:03 CST
From: David N Gray <Gray@DSG.csc.ti.com>
To: Thom Linden <baggins@IBM.com>
Cc: CL-Characters@SAIL.Stanford.edu, Bartley@MIPS.csc.ti.com,
        Waldrum@Tilde.csc.ti.com
Subject: Re: cs proposal straw vote
In-Reply-To: Msg of Wed, 22 Feb 89 12:08:15 PST from Thom Linden <baggins@IBM.com>

>   I would like to take a straw vote on various components of
> the Characters proposal.  The primary intent is to resolve the
> actual list of items to be voted upon at the March meeting.
> Let me know if you think some items should be separated or
> combined
...
> Issue: CHAR-FONT-UNUSED-CHAR-BITS-NONPORTABLE
> Problem: CHAR-FONT isn't used, CHAR-BITS isn't portable.
> Proposal:
>           Eliminate of font and bit attributes.
>           Add rules for an implementation supporting attributes.
>           Redefine STRING-CHAR as implementation defined.
>           Remove CHAR-FONT-LIMIT
>           Remove CHAR-BITS-LIMIT
>           Remove INT-CHAR
>           Remove CHAR-BITS
>           Remove CHAR-FONT
>           Remove MAKE-CHAR
>           Remove CHAR-CONTROL-BIT
>           Remove CHAR-META-BIT
>           Remove CHAR-SUPER-BIT
>           Remove CHAR-HYPER-BIT
>           Remove CHAR-BIT
>           Remove SET-CHAR-BIT

Yes, I can accept this.
---

> Issue: CHAR-INT-ONLY-USEFUL-WHEN-ATTRIBUTES-SUPPORTED
> Problem: CHAR-INT behavior is CHAR-CODE unless implementation
>   defined attributes are supported.
> Proposal:
>           Remove CHAR-INT

I had to stop and think about why this wasn't part of the previous issue.
Perhaps the thought was that a portable way to turn all of a character into
a number (e.g. for a hash code) would be desirable even if only some
implementations support attributes?  That sounds like a legitimate
concern, so I vote No.
                   --

> Issue: CHARACTER-TYPE-RESTRICTIVEC
> Problem: CHARACTER type doesn't allow thin & fat characters.
> Proposal:                                                                a
>           Define BASE-CHARACTER as a subtype of STRING.                  a
>           Standard characters are a subset of the base
>              characters.
>           STANDARD-CHAR type is replaced by (CHARACTER :STANDARD)
>           Remove the semi-standard characters.

I have been unable to imagine the reason for linking semi-standard
characters with fat characters; these should be separate issues.

Yes to fat characters.
---
No to removing the semi-standard characters.  I still have yet to hear a
--    plausible rationale for doing this.

> Issue: STRING-TYPE-RESTRICTIVE
> Problem: STRING type doesn't allow thin & fat strings.
> Proposal:                                                                a
>           Define STRING as a union type                                  a
>           STRING used as a type specifier for object creation
>              means (VECTOR CHARACTER)
>           All string functions operate as specified on any               a
>              string object except it is an error to insert
>              an extended character into a base string.
>           Extend the COERCE function to allow coercion from              a
>             base string to extended string.

Yes.
---

> Issue: STRING-TYPE-ABBREVIATIONS
> Problem: new types are awkward to name, want abbreviations.
> Proposal:                                                                ne
>           Add BASE-STRING
>           Add GENERAL-STRING

Yes.
---

> Issue: SIMPLE-STRING-TYPE-RESTRICTIVE
> Problem: SIMPLE STRING type doesn't allow thin & fat strings.
> Proposal:                                                                a
>           Define SIMPLE-STRING as a union type                           a
>           Define SIMPLE-STRING as a type specifier for object
>              creation means (SIMPLE-ARRAY CHARACTER (size))

Yes.
---
> Issue: SIMPLE-STRING-TYPE-ABBREVIATIONS
> Problem: new types are awkward to name, want abbreviations.
> Proposal:                                                                ne
>           Add SIMPLE-BASE-STRING
>           Add SIMPLE-GENERAL-STRING

Yes.
---
> Issue: FILE-EXTERNAL-REPRESENTATION
> Problem: can't specify external encoding even when there are lots
> Proposal:
>           Add :EXTERNAL-CODED-CHARACTER-FORMAT keyword to OPEN

Yes.
---
> Issue: STRING-BINARY-WIDTH
> Problem: Can't find out how many bytes a string will take when written as
> text
> Proposal:
>           Add :EXTERNAL-CODED-STRING-LENGTH function

No; I'm not sure that this has been adequately thought out.
--
> Issue: CHAR-CODE-NON-PORTABLE
> Problem: no way to talk about well-known external coding methods, only
> internal codes
> Proposal:
>           Add CHAR-CCS-VALUE function

Yes, although I'm not too happy about the name; "value" doesn't really say
---  much.  Why not CHAR-CCS-INDEX ?  Or CHAR-EXTERNAL-CODE ?  Yes also to
     the inverse function.

> Issue: CHARACTER-IDENTIFICATION-NONPORTABLE
> Problem: Can't portably talk about non-standard characters
> Proposal:
>            Introduce the concept of Registries
>            Standardize on #\registry:id, add all-implemented-registries
>            Add *ALL-CHARACTER-REGISTRY-NAMES* variable
>            Add FIND-CHAR function
>            Add CHAR-LABEL function
>            Add CHAR-REGISTRY-NAME function
>            New syntax for CHARACTER type specifier
>            New #\label:registry character name syntax
>            New argument to CHARACTERP

No.  I'm not convinced that this approach is feasible or even necessary.
---  There appears to be a great deal of machinery being created solely to
     support the premise stated in section 2.2 that all characters need to
     be decomposable into one and only one name.  I don't see the need for
     that.

∂02-Mar-89  1941	CL-Characters-mailer 	Really about TYPEP failures: Comments on the Character proposal dated January 1, 1989   
Received: from STONY-BROOK.SCRC.Symbolics.COM (SCRC-STONY-BROOK.ARPA) by SAIL.Stanford.EDU with TCP; 2 Mar 89  19:41:42 PST
Received: from EUPHRATES.SCRC.Symbolics.COM by STONY-BROOK.SCRC.Symbolics.COM via INTERNET with SMTP id 549857; 2 Mar 89 22:39:01 EST
Date: Thu, 2 Mar 89 22:39 EST
From: David A. Moon <Moon@STONY-BROOK.SCRC.Symbolics.COM>
Subject: Really about TYPEP failures: Comments on the Character proposal dated January 1, 1989
To: Robert W. Kerns <RWK@FUJI.ILA.Dialnet.Symbolics.COM>
cc: Jon L White <jonl@lucid.com>, Baggins@IBM.COM, CL-Characters@SAIL.STANFORD.EDU,
    X3J13@SAIL.STANFORD.EDU, Common-Lisp-Implementors@Symbolics.COM, KMP@Symbolics.COM,
    Palter@Symbolics.COM
In-Reply-To: <19890204142708.2.RWK@CALVARY.ILA.Dialnet.Symbolics.COM>
Supersedes: <19890303023922.6.MOON@EUPHRATES.SCRC.Symbolics.COM>
Comments: Removed % signs
Message-ID: <19890303033902.9.MOON@EUPHRATES.SCRC.Symbolics.COM>

    Date: Sat, 4 Feb 89 09:27 EST
    From: Robert W. Kerns <RWK@FUJI.ILA.Dialnet.Symbolics.COM>

    Legitimate operations on BASE-CHARACTER do not
    produce characters in (AND CHARACTER (NOT BASE-CHARACTER)).

I'm not sure which programs written using symbols in the LISP package
are "legitimate operations" and which are not.  However, I didn't see
anything in the 21 Feb 89 proposal that says that CHAR-UPCASE of
a BASE-CHARACTER is necessarily a BASE-CHARACTER, for example.
That doesn't sound unreasonable, though; should it be added?

∂09-Mar-89  1340	CL-Characters-mailer 	Character proposal nit-pick 
Received: from Think.COM by SAIL.Stanford.EDU with TCP; 9 Mar 89  13:40:02 PST
Return-Path: <barmar@Think.COM>
Received: from OCCAM.THINK.COM by Think.COM; Thu, 9 Mar 89 16:36:12 EST
Date: Thu, 9 Mar 89 16:37 EST
From: Barry Margolin <barmar@Think.COM>
Subject: Character proposal nit-pick
To: cl-characters@sail.stanford.edu
Cc: chapman%aitg.DEC@decwrl.dec.com
Message-Id: <19890309213740.6.BARMAR@OCCAM.THINK.COM>

On page 22 of the Characters proposal, I'm wondering whether the first
change says what you intend to say.  The wording is "All implementations
provide specialized arrays for the cases when the components are
characters (or optionally, special subsets of the characters)".  By
using "or" in the parenthetical comment, you are saying that an
implementor may choose to provide a specialized array type for a
CHARACTER subset, and choose NOT to provide a specialized array type for
CHARACTER.

I think you intended to say "and" in this sentence, meaning that an
implementation MUST provide a specialized array type for :ELEMENT-TYPE
CHARACTER, and it may optionally provide additional specialized types
for CHARACTER subtypes.  In this case, the parenthetical comment isn't
really necessary, since it is mentioned in other places that an
implementation may provide specialized array types in addition to those
explicitly required by the standard.

                                                barmar

∂15-Mar-89  0622	X3J13-mailer 	BASE-CHARACTER  
Received: from Xerox.COM by SAIL.Stanford.EDU with TCP; 15 Mar 89  06:22:19 PST
Received: from Semillon.ms by ArpaGateway.ms ; 15 MAR 89 06:09:24 PST
Date: 15 Mar 89 06:07 PST
From: masinter.pa@Xerox.COM
Subject: BASE-CHARACTER
To: CL-Characters@SAIL.STANFORD.EDU
cc: X3J13@SAIL.STANFORD.EDU
Message-ID: <890315-060924-3563@Xerox>

There were a couple of points that were only tersly alluded to in my note
on the character proposal.

BASE-CHARACTER

I think most of the confusions and problems with BASE-CHARACTER in the
proposal result from its definition in terms of the 'natural' encoding of
an implementation.


I think defining BASE-CHARACTER exactly as

(UPGRADED-ARRAY-ELEMENT-TYPE 'STANDARD-CHAR)

has all of the right properties. (Recall that UPGRADED-ARRAY-ELEMENT-TYPE
was added by the (passed) proposal in ARRAY-TYPE-ELEMENT-TYPE-SEMANTICS.)
This definition is unambiguous and shows the relationship between the
string encoding and array upgrading strategies of the implementation and
the important character types. It ensures STANDARD-CHAR is a subtype of
BASE-CHARACTER. 

Whether a character is "base" really only depends on the way that an
implementation represents strings, and not any other properties of the
implementation or the host operating system. Imagine two implementations on
a Unix machine, one of which encodes all strings as 16-bit characters, and
another which has two kinds of strings: 8-bit strings and 16-bit strings.
In the first implementation, the BASE-CHARACTER is CHARACTER: there's only
one kind of string. In the second implementation, the BASE-CHARACTER would
be those that could be stored in an 8-bit string, and it would be a proper
sub-type of CHARACTER.

To make this change requires leaving STANDARD-CHAR in the standard and then
merely defining BASE-CHARACTER in terms of it. Clearly BASE-STRING, if such
a name is necessary, would then just be a shorthand for (VECTOR
BASE-CHARACTER) with all the semantics implied by the array-element-type
proposals. 

∂15-Mar-89  0924	CL-Characters-mailer 	BASE-CHARACTER    
Received: from STONY-BROOK.SCRC.Symbolics.COM (SCRC-STONY-BROOK.ARPA) by SAIL.Stanford.EDU with TCP; 15 Mar 89  09:24:27 PST
Received: from EUPHRATES.SCRC.Symbolics.COM by STONY-BROOK.SCRC.Symbolics.COM via CHAOS with CHAOS-MAIL id 557489; Wed 15-Mar-89 12:22:00 EST
Date: Wed, 15 Mar 89 12:22 EST
From: David A. Moon <Moon@STONY-BROOK.SCRC.Symbolics.COM>
Subject: BASE-CHARACTER
To: CL-Characters@SAIL.STANFORD.EDU
cc: X3J13@SAIL.STANFORD.EDU
In-Reply-To: <890315-060924-3563@Xerox>
Message-ID: <19890315172201.1.MOON@EUPHRATES.SCRC.Symbolics.COM>

After thinking it over, I agree with Larry Masinter's comments
in the referenced message and the suggestion to define BASE-CHARACTER
as (UPGRADED-ARRAY-ELEMENT-TYPE 'STANDARD-CHAR).

∂15-Mar-89  1018	X3J13-mailer 	BASE-CHARACTER  
Received: from Think.COM by SAIL.Stanford.EDU with TCP; 15 Mar 89  10:17:14 PST
Received: from fafnir.think.com by Think.COM; Wed, 15 Mar 89 13:13:21 EST
Return-Path: <gls@Think.COM>
Received: from verdi.think.com by fafnir.think.com; Wed, 15 Mar 89 13:14:40 EST
Received: by verdi.think.com; Wed, 15 Mar 89 13:11:29 EST
Date: Wed, 15 Mar 89 13:11:29 EST
From: Guy Steele <gls@Think.COM>
Message-Id: <8903151811.AA02727@verdi.think.com>
To: CL-Characters@sail.stanford.edu
Cc: X3J13@sail.stanford.edu
Subject: BASE-CHARACTER

Larry's suggestion about defining BASE-CHARACTER to be simply
(UPGRADED-ARRAY-ELEMENT-TYPE 'STANDARD-CHAR) has a great deal
of charm, and I don't see anything wrong about it.
So I support it.
--Guy

∂24-Mar-89  1615	CL-Characters-mailer 	cs proposal comments   
Received: from lucid.com by SAIL.Stanford.EDU with TCP; 24 Mar 89  16:15:33 PST
Received: from jack-jr ([192.9.200.25]) by heavens-gate.lucid.com id AA05450g; Fri, 24 Mar 89 16:10:08 PST
Received: by jack-jr id AA07339g; Fri, 24 Mar 89 16:13:53 PST
Date: Fri, 24 Mar 89 16:13:53 PST
From: Dave Unietis <dru@lucid.com>
Message-Id: <8903250013.AA07339@jack-jr>
To: cl-characters@sail.stanford.edu
Subject: cs proposal comments

I'm not going to be attending the X3J13 meeting, so Patrick and JonL will
be representing Lucid on the character subcommittee issues. The following
are what I consider to be the most important unresolved issues (the comments
apply to the version dated February 21, which is the latest I have received):


String Type Names

I am in favor of the existing proposal, which defines the type hierarchy:

               string                        
                 /\
                /  \
               /    \
     base-string   general-string

and the equivalent type hierarchy, for those strings that are "simple", 
i.e. meet the requirements on the bottom of p28 of CLtL:

           simple-string                        
                 /\
                /  \
               /    \
simple-base-string   simple-general-string


In David Moon's mail of Jan 24, he suggested that the above string type 
hierarchy be adopted, but that the corresponding simple versions be named
as follows:

              (unnamed)
                 /\
                /  \
               /    \
simple-base-string   simple-string


This is bad, for several reasons.  First, it is counterintuitive to use the
name simple-string in this manner.  In all other cases in Common Lisp, a type
called simple-<array-name> is exactly equivalent to the type <array-name>,
with the qualifications of CLtL p28.  Defining a simple-string as a simple
version of a general-string will likely cause a lot of confusion for users. 
Lucid's users, in particular, are very used to using simple-<array-name> as an
optimization of <array-name>, because only simple arrays have open-coded
accessors in Lucid's implementation. 

More important, this definition of simple-string changes its semantics in an
incompatible manner. The intent of the new type simple-base-string is to 
provide a space-optimized version of simple-string that retains all of its
(simple-string's) properties. In this version, if x is a simple-base-string, 
then (typep x 'simple-string) is not true, which seriously cripples this use
of simple-base-strings.  For example, 
	(simple-string-p (symbol-name (intern "FOO"))) => nil
with this definition and the symbol intern optimization suggested elsewhere
in the proposal.

The main argument against the proposed type hierarchy seems to be that
the name 'simple-general-string' itself is bad.  This is true - it shares
this property with simple-general-vector, where merging two independent
concepts into the prefix of a name produces an ugly result.  If the name
seems especially bothersome, then someone could try and come up with 
a name other that "general" to replace the use in both 'general-string' and 
'simple-general-string'.  

In short, I am in favor of the proposal as it stands in this area; I'm 
making these comments only if someone raises the issue again next week.


Registries and Repertoires

I'm afraid that I'm in sympathy with David Gray's objections to this 
part of the proposal, which are similar to the objections Larry had
at the January meeting.  I think the mechanisms that are defined are
on the right track, but I'm not sure that they by themselves provide any
measurable gain in portability.  Asking ISO to define character registry
names and contents that meet X3J13's requirements (non-overlapping, 
all names consisting solely of standard-char) seems hopeless to me.  
Either we should bite the bullet and define the repertoire names and character
labels ourselves where we can, or we should just flush this part of the
proposal.  I think it is far less important what names are adopted than that
some names or other are chosen. Some thought to extensibility (e.g. the Kanji
repertoire) also needs to be given.  Also, I'm not sure the non-overlapping
requirement is practical, despite its aesthetic merits.  


Minor Points

The name 'char-ccs-value' (p35) is bad and should be changed to something
simpler, like character-set-code or char-set-value.  Its arguments should help
make its functionality clear.  Also, the inverse function is needed - the name
find-char could be used if it ends up not being used with registries.

The keyword name :external-coded-character-format (p13) has been changed
from :external-code-format - I liked the old name better. 

On page 24-25, the discussion of what string and simple-string means for
"object-creation" should be flushed.  This discussion belongs only as part
of the relevant functions (make-string, make-sequence), as it really
isn't a fundamental property of the types.  For example, there is no
discussion under the definition of the types T or ARRAY of what T
or ARRAY means when specified as the first argument to make-sequence. 
Including this discussion in the type section of the document makes an already
complex issue more confusing.

On page 29, the penultimate bullet should be reworded - I assume this
really mean attributes in all strings, not just those presented to the reader.
Also, another bullet is needed to indicate what happens when characters
with attributes are stored in files. This information might be worth including
on page 40, as well.


dave



∂26-Mar-89  2146	CL-Characters-mailer 	Really about TYPEP failures 
Received: from Riverside.SCRC.Symbolics.COM (SCRC-RIVERSIDE.ARPA) by SAIL.Stanford.EDU with TCP; 26 Mar 89  21:46:40 PST
Received: from F.ILA.Dialnet.Symbolics.COM (DIAL|16174944833) by Riverside.SCRC.Symbolics.COM via DIAL with SMTP id 326013; 27 Mar 89 00:44:44 EST
Received: from CALVARY.ILA.Dialnet.Symbolics.COM by F.ILA.Dialnet.Symbolics.COM via CHAOS with CHAOS-MAIL id 12127; Mon 27-Mar-89 00:45:13 EST
Date: Mon, 27 Mar 89 00:45 EST
From: Dennis L. Doughty <doughty@FUJI.ILA.Dialnet.Symbolics.COM>
Subject: Really about TYPEP failures
To: David A. Moon <Moon@Riverside.SCRC.Symbolics.COM>
cc: Jon L White <jonl%lucid.com@Riverside.SCRC.Symbolics.Com>, Baggins%IBM.COM@Riverside.SCRC.Symbolics.Com,
    CL-Characters%SAIL.STANFORD.EDU@Riverside.SCRC.Symbolics.Com, X3J13%SAIL.STANFORD.EDU@Riverside.SCRC.Symbolics.Com,
    Common-Lisp-Implementors@Riverside.SCRC.Symbolics.Com, KMP@Riverside.SCRC.Symbolics.Com,
    Palter@Riverside.SCRC.Symbolics.Com
In-Reply-To: <19890303033902.9.MOON@EUPHRATES.SCRC.Symbolics.COM>
Message-ID: <19890327054506.1.DOUGHTY@CALVARY.ILA.Dialnet.Symbolics.COM>

    Date: Thu, 2 Mar 89 22:39 EST
    From: David A. Moon <Moon@STONY-BROOK.SCRC.Symbolics.COM>
	Date: Sat, 4 Feb 89 09:27 EST
	From: Robert W. Kerns <RWK@FUJI.ILA.Dialnet.Symbolics.COM>
    
	Legitimate operations on BASE-CHARACTER do not
	produce characters in (AND CHARACTER (NOT BASE-CHARACTER)).
    
    I'm not sure which programs written using symbols in the LISP package
    are "legitimate operations" and which are not.  
What I had in mind was that the following function is not
legitimate:

(defun pick-a-char (char)
  (code-char (mod (+ (random char-code-limit)
		     (code-char char))
		  char-code-limit)))

In other words, manipulating characters AS CHARACTERS doesn't
extend the range beyond the domain.  Playing games with char-codes
can do all sorts of weird shit, but that's not part of the semantics
of the language.

						    However, I didn't see
    anything in the 21 Feb 89 proposal that says that CHAR-UPCASE of
    a BASE-CHARACTER is necessarily a BASE-CHARACTER, for example.
    That doesn't sound unreasonable, though; should it be added?

Good point.  Yes, I think so.

∂06-Apr-89  1507	CL-Characters-mailer 	Closure of BASE-CHARACTER   
Received: from STONY-BROOK.SCRC.Symbolics.COM by SAIL.Stanford.EDU with TCP; 6 Apr 89  15:07:10 PDT
Received: from EUPHRATES.SCRC.Symbolics.COM by STONY-BROOK.SCRC.Symbolics.COM via CHAOS with CHAOS-MAIL id 572982; Thu 6-Apr-89 18:07:12 EDT
Date: Thu, 6 Apr 89 18:07 EDT
From: David A. Moon <Moon@STONY-BROOK.SCRC.Symbolics.COM>
Subject: Closure of BASE-CHARACTER
To: CL-Characters@sail.stanford.edu
Message-ID: <19890406220703.4.MOON@EUPHRATES.SCRC.Symbolics.COM>

Forwarding a comment from a customer (actually it originated
from something RWK said):

It might be a good idea for the specification to say that certain
functions when applied to a BASE-CHARACTER return a BASE-CHARACTER.
It's already true that these functions applied to a STANDARD-CHAR
always return a STANDARD-CHAR.  Looking at CLtL chapter 13, the
list of functions would be just CHAR-UPCASE and CHAR-DOWNCASE

What do you think?

∂10-Apr-89  0034	CL-Characters-mailer 	Re: Closure of BASE-CHARACTER    
Received: from Xerox.COM by SAIL.Stanford.EDU with TCP; 10 Apr 89  00:34:25 PDT
Received: from Semillon.ms by ArpaGateway.ms ; 10 APR 89 00:34:25 PDT
Date: 10 Apr 89 00:33 PDT
From: masinter.pa@Xerox.COM
Subject: Re: Closure of BASE-CHARACTER
In-reply-to: David A. Moon <Moon@STONY-BROOK.SCRC.Symbolics.COM>'s message
 of Thu, 6 Apr 89 18:07 EDT
To: David A. Moon <Moon@STONY-BROOK.SCRC.Symbolics.COM>
cc: CL-Characters@sail.stanford.edu
Message-ID: <890410-003425-3565@Xerox>

While this is well-intentioned, I think it might conflict with some other
pieces of happenstance. For example, if you use an encoding which
sandwiches, say, the lower-case greek characters into the same set of 8-bit
codes as STANDARD-CHAR, you'd be tempted to make BASE-CHARACTER be
"charcters that can be represented with 8 bits", but find that you really
wanted CHAR-UPCASE to map from lower-case greek to upper-case greek, even
though an upper-case greek letter might take more than 8 bits to represent
and wouldn't be BASE-CHARACTER.

CHAR-UPCASE should be semantics-driven. BASE-CHARACTER is an implementation
hack for efficient representation of STANDARD-CHAR in the face of a lot of
characters. Don't mix 'em.